Micro-RNA Scaffolds, Non-naturally Occurring Micro-RNAs, and Methods for Optimizing Non-naturally Occurring Micro-RNAs

ABSTRACT

The present disclosure provides non-naturally occurring miR-196a-2 miRNAs and non-naturally occurring miR-204 miRNAs. The non-naturally occurring miRNAs of the disclosure have mature strand sequences distinct from their endogenous counterparts. The disclosure also provides methods of selecting mature strand sequences that function optimally in non-naturally occurring miR-196a-2 miRNAs. The methods and compositions of the disclosure may be used to mediate gene silencing via the RNAi pathway.

RELATED APPLICATION INFORMATION

This application is being filed on 22 May 2008, as a PCT InternationalPatent application in the name of Dharmacon, Inc., a U.S. nationalcorporation, applicant for the designation of all countries except theU.S., and Melissa KELLEY, a citizen of the U.S., Amanda BIRMINGHAM, acitizen of the U.S., Jon KARPILOW, a citizen of the U.S., AnastasiaKHVOROVA, a citizen of Russia, and Kevin SULLIVAN, a citizen of theU.S., applicants for the designation of the U.S. only, and claimspriority to U.S. Provisional Patent Application Ser. No. 60/939,785filed on 23 May 2007.

FIELD OF THE INVENTION

The present invention relates to the field of RNAi. In particular, theinvention describes miRNA-based scaffolds into which targeting sequencescan be integrated to form non-naturally occurring miRNAs thateffectively mediate gene knockdown.

BACKGROUND

RNA interference (RNAi) is a near-ubiquitous pathway involved inpost-transcriptional gene modulation. A key effector molecule of RNAi isthe microRNA (miRNA or miR). These small, non-coding RNAs aretranscribed as primary miRNAs (pri-miRNA) and processed in the nucleusby Drosha (a Type III ribonuclease) to generate short hairpin structuresreferred to as pre-miRNAs (FIG. 1). The resulting molecules aretransported to the cytoplasm and processed by a second nuclease (Dicer)before being incorporated into the RNA Induced Silencing Complex (RISC).Interactions between the mature miRNA-RISC complex and messenger RNA(mRNA), particularly between the seed region of the miRNA guide strand(nucleotides 2-7) and regions of the 3′ UTR of the mRNA, leads to geneknockdown by transcript cleavage and/or translation attenuation.

While study of native substrates (miRNA) has garnered considerableinterest in recent years, the RNAi pathway has also been recognized as apowerful research tool. Small double stranded RNAs (referred to as smallinterfering RNAs or siRNA) generated by synthetic chemistries orenzymatic methods can be introduced into cells by a variety of means(e.g. lipid mediated transfection, electroporation) and enter thepathway to target specific gene transcripts for degradation. As such,the RNAi pathway serves as a potent tool in the investigation of genefunction, pathway analysis, and drug discovery, and is envisioned tohave future applications as a therapeutic agent.

Though the use of synthetic siRNA serves the needs of most geneknockdown experiments, there are some instances where syntheticmolecules are unsuitable. A fraction of the cell types are resilient orhighly sensitive to commonly used transfection methods and/or reagents.In still other instances, the needs of the experimental system requirethat gene knockdown be achieved for periods longer than those providedby synthetic molecules (typically 4-10 days).

Vector-based delivery of silencing reagents has previously been achievedusing a range of delivery (e.g., lentiviral) and scaffold (simplehairpins, miRNA-based) configurations (Samakoglu et al., NatureBiotech.; Lei Y. S. et al., 2005; Leirdal and Sioud, 2002; Anderson etal., 2003; Grimm, D. et al., (2006) Nature Letters 441:537-541).

SUMMARY

In one aspect, the disclosure provides a non-naturally occurringmiR-196a-2 miRNA which comprises a nucleic acid having a stem-loopstructure. The stem of the stem-loop structure incorporates a maturestrand-star strand duplex. The sequence of said mature strand isdistinct from the sequence of the endogenous mature strand of miR-196a-2and is at least partially complementary to a portion of a target RNA.The star strand is at least partially complementary to the maturestrand. In certain embodiments, the mature strand is 19 nucleotides inlength and comprises the sequence:

5′ U₁M₂M₃M₄M₅M₆M₇M₈M₉M₁₀M₁₁M₁₂M₁₃M₁₄M₁₅M₁₆M₁₇M₁₈M₁₉3′

in which M₂-M₁₉ are nucleotides, and the star strand comprises thesequence:

-   -   5′S₁S₂S₃S₄S₅S₆S₇S₈S₉S₁₀S₁₁S₁₂S₁₃S₁₄S₁₅S₁₆S₁₇S₁₈G₁₉3′        in which S₁-S₁₈ are nucleotides, and where M₁₂ and S₈ are not        complementary bases.

In another aspect, the disclosure provides a non-naturally occurringmiR-204 miRNA comprising a nucleic acid having a stem-loop structure.The stem of the stem-loop structure incorporates a mature strand-starstrand duplex. The sequence of said mature strand is distinct from thesequence of the endogenous mature strand of miR-204 and is at leastpartially complementary to a portion of a target RNA. The star strand isat least partially complementary to the mature strand.

In another aspect, the disclosure provides a non-naturally occurringmiR-196a-2 miRNA capable of being processed in a cell to yield a maturemiRNA at least partially complementary to a portion of a target RNA. Thesequence of the mature miRNA is different from the sequence ofendogenous miR-196a-2 mature miRNA.

In another aspect, the disclosure provides a non-naturally occurringmiR-204 miRNA capable of being processed in a cell to yield a maturemiRNA at least partially complementary to a portion of a target RNA. Thesequence of the mature miRNA is different from the sequence ofendogenous miR-204 mature miRNA.

In a further aspect, the disclosure provides cells comprising theaforementioned non-naturally occurring miR-196a-2 or miR-205 miRNAs.

In another aspect, the disclosures provides methods of lowering thefunctional capacity of a target RNA in a cell. The methods involvecontacting the cell with a vector capable of expressing a non-naturallyoccurring miR-196a-2 or miR-204 miRNA. The non-naturally occurring miRNAis processed in the cell to yield a mature miRNA having the sequence ofthe mature strand.

In a further aspect, the disclosure provides vectors suitable for theexpression of a non-naturally occurring miR-196a-2 or miR-204 miRNA. Thevector may comprise a promoter operably linked to a reporter genecomprising an artificial intron, with the non-naturally occurring miRNAlocated within the artificial intron or within a 3′ UTR. The vector mayalso be a retroviral vector, for example a lentiviral vector, in whichexpression of the non-naturally occurring miRNA is driven by a LongTerminal Repeat (LTR).

In a further aspect, the disclosure provides a method for selecting thesequence of a mature strand and a star strand of a non-naturallyoccurring miR-196a-2 miRNA capable of reducing the functional activityof a target RNA when expressed in a cell. The method involves analyzingthe nucleotide sequence of the target RNA to identify a subsequence:

5′ R₁R₂R₃R₄R₅R₆R₇R₈R₉R₁₀R₁₁R₁₂R₁₃R₁₄R₁₅R₁₆R₁₇R₁₈A₁₉3′

in which R₁-R₁₈ are nucleotides selected according to at least onecriterion selected from the following:

-   -   1) at R₁, G is favored over A, A is favored over U, and U is        favored over C    -   2) at R₂, A is favored over each of U and G, and each of U and G        is favored over C;    -   3) at R₃, each of U, G, and A is favored over C    -   4) at R₄, each of U, G, and A is favored over C    -   5) at R₅, A is favored over each of U and G, and each of U and G        is favored over C    -   6) at R₇, A is favored over each of U and C, and each of U and C        is favored over G    -   7) at R₈, A is favored over each of U and G, and each of U and G        is favored over C    -   8) at R₁₂, A is favored over each of U and C, and each of U and        C is favored over G    -   9) at R₁₃, U is favored over each of G and A, and each of G and        A is favored over C    -   10) at R₁₄, U is favored over each of C and G, and each of C and        G is favored over A; and    -   11) at R₁₅, A is favored over each of C, G, and U;    -   12) said subsequence does not include a tetranucleotide sequence        selected from the group consisting of AAAA, UUUU, GGGG, and        CCCC;    -   13) said subsequence has a total G+C content of not more than        10;    -   14) said subsequence has a G+C content of not more than 4        between R₁₂-R₁₈;

Once the subsequence is identified, a mature strand sequence is selectedwhich is at least partially complementary to the subsequence. A starstrand sequence is then selected which is at least partiallycomplementary to the mature strand.

In a further aspect, the disclosure provides another method forselecting the sequence of a mature strand and a star strand of anon-naturally occurring miR-196a-2 miRNA capable of reducing thefunctional activity of a target RNA when expressed in a cell. The methodinvolves analyzing the nucleotide sequence of the target RNA to identifya subsequence:

5′R₁R₂R₃R₄R₅R₆R₇R₈R₉R₁₀R₁₁R₁₂R₁₃R₁₄R₁₅R₁₆R₁₇R₁₈R₁₉R₂₀A₂₁3′

in which R₁-R₂₀ are nucleotides selected according to at least onecriterion selected from the following:1) at R₁, A is favored over each of U, C, and G2) at R₂, each of U, C, and A are favored over G;3) at R₃, G is favored over A, A is favored over U, and U is favoredover C;4) at R₄, A is favored over each of U and G, and each of U and G isfavored over C;5) at R₅, A is favored over each of U and G, and each of U and G isfavored over C;6) at R₆, each of U and A is favored over C, and C is favored over G;7) at R₇, is A is favored over each of U and G, and each of U and G isfavored over C;8) at R₅, each of U, C, and A is favored over G;9) at R₉, A is favored over each of U and C, and each of U and C isfavored over G;10) at R₁₀, A is favored over each of U and G, and each of U and G isfavored over C;11) at R₁₄, A is favored over each of U and C, and each of U and C isfavored over G;12) at R₁₅, A is favored over U, U is favored over G, and G is favoredover C;13) at R₁₆, U is favored over each of C, G, and A;14) at R₁₇, A is favored over C, C is favored over U, and U is favoredover G;15) at R₁₉, each of U, A, and C is favored over G;16) said subsequence does not include a tetranucleotide sequenceselected from the group consisting of AAAA, UUUU, GGGG, and CCCC;17) said subsequence has a total G+C content of not more than 10; and18) said subsequence has a G+C content of not more than 4 betweenR₁₄-R₂₀;

Once the subsequence is identified, a mature strand sequence is selectedwhich is at least partially complementary to the subsequence. A starstrand sequence is then selected which is at least partiallycomplementary to the mature strand.

Other aspects of the invention are disclosed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1: Schematic drawing of the RNAi pathway. Drawing provides 1) adepiction of the processing of pri-miRNA→pre-miRNA→mature miRNA, and 2)the position where siRNA enter the pathway.

FIG. 2: Schematic drawing demonstrating the relative orientation of thetargeting sequence (in the scaffold) with respect to the target sequence(mRNA) and the reverse complement.

FIG. 3: Schematic drawing identifying several of the key positions inthe miR-196a-2 scaffold that can contain secondary structure andnon-Watson-Crick base pairing. Positions in both the mature (targeting)and star strand are indicated.

FIG. 4: (A) provides the sequence, restriction sites, and importantattributes (e.g., 5′ and 3′ splice sites, branch points, polypyrimidinetracts) of the artificial intron used in these studies. (B) provides aschematic of the position of the artificial intron in GFP (GreenFluorescent Protein). SD=splice donor, SA=splice acceptor. (C-E)provides the targeting and flanking sequences associated withmiR-196a-2, -26b, and -204, respectively. Sequence provided in the 5→3′orientation. The underlined portions of the sequences are illustrated indetail in

FIG. 6. (F) diagram of the dual luciferase reporter construct. Schematicdiagram of psi CHECK-2 vector (Promega) used to construct cleavage-basedreporter plasmids. Vector includes firefly (hluc) and Renilla (hRluc)luciferase genes. Target sequence is inserted in the 3′ UTR of hRluc.

FIG. 5: Screening of Ten Distinct miRNA Scaffolds. A screen of multiplemiRNA scaffolds using the dual luciferase reporter construct identifiedthree miRNAs (miR-204, miR-26b, and miR-196a-2) that exhibited highactivity in the reporter assay. Horizontal line represents 80%knockdown.

FIG. 6: Diagram of the Changes Placed in the miRNA scaffolds. Wobblebase pairs are indicated by •. (A) example of a miR-26b construct. Top:Box indicates position of the mature strand sequence (which is shownhere as the endogenous mature strand sequence). Nucleotides in uppercase format are substitutions that introduce BlpI and SacI restrictionsites. Bottom: bar graph shows the effects of incorporating restrictionsites on mature and star strand functionality. (B) example of a miR-204construct. Top: box indicates position of the mature strand sequence(which is shown here as the endogenous mature strand sequence).Nucleotides in upper case format are substitutions that introduce BlpIand SacI restriction sites. Bottom: bar graph demonstrates the effectsof incorporating restriction sites on mature and star strandfunctionality. (C) examples of miR-196a-2 constructs. Box indicatesposition of the mature strand sequence which is shown here as theendogenous 21 nucleotide mature strand sequence. Note, however, thatendogenous mature strand sequence of miR-196a-2 may actually be 22nucleotides in length, and would thus extend one nucleotide (the gnucleotide indicated by *) further at the 3′ end than the sequenceindicated in the box. Thus, the g indicated by the * may either be partof the mature strand or part of the scaffold (and the oppositenucleotide c may be part of the star strand or part of the scaffold)depending on how the scaffold is processed by Drosha and/or Dicer. BlpIsite is a natural component of the scaffold. Nucleotides in upper caseformat are substitutions that introduce XbaI, ScaI, and SacI restrictionsites. (D) bar graph demonstrates the effects of incorporatingrestriction sites on mature and star strand functionality.ScaI=mismatches in the scaffold structure, ScaI+, base pair changes havebeen introduced to eliminate mismatches; XbaI=introduces mismatches inthe structure; XbaI+=base pair changes have been introduced to eliminatemismatches. All experiments associated with assessing activity ofmodified constructs were performed using the dual luciferase reporterconstructs. (E) Examples of the stem-loop regions of miR-196a-2 andmiR-204 scaffolds (M=mature strand, S=star strand). Note that the gindicated by * in the miR-196a-2 example may also be considered to bepart of the mature strand sequence (F) Example of a miR-204 scaffold.The site of mature strand and the star strand insertion is depictedschematically. Nucleotide substitutions (relative to endogenous miR-204)are indicated by use of upper case format. (G) Examples of miR-196a-2scaffolds. The site of mature strand and the star strand insertion isdepicted schematically. Nucleotide substitutions (relative to endogenousmiR-196a-2) are indicated by use of upper case format. Note that thenucleotide g indicated by * may also be considered to be part of themature strand sequence (and thus the opposite position c may also beconsidered to be part of the star strand sequence).

FIG. 7: (A) schematic of sequences used in the GAPDH walk. Sequencesused in the GAPDH walk targeted a defined region in the GAPDH gene andrepresented sequential 2nt steps across the target region. The sequencesrepresenting each target position were synthesize and cloned into themiR-26b, -204, and -196a-2 scaffolds. Note: secondary structure thatmimicked the native construct was incorporated into design whenpossible. (B) GAPDH target sequence that was inserted into the 3′UTR ofthe hRluc reporter to assess functionality of various sequences. Uppercase letters represent the actual targeted sequence. (C) silencingefficiency of the GAPDH walk when sequences are delivered as siRNA. (D)silencing efficiency of each sequence in the GAPDH walk when deliveredin the miR-26b scaffold. (E) silencing efficiency of each sequence inthe GAPDH walk when delivered in the miR-204 scaffold. (F) silencingefficiency of each sequence in the GAPDH walk when delivered in themiR-196a-2 scaffold. (G) silencing efficiency of each sequence in theGAPDH walk when delivered in the miR-196a-2 scaffold. Secondarystructures were not preserved. (H) Bar graph showing the number ofsequences that induced knockdown at each level in each backbone. Note:miR-196a-2 hp represents sequences that did not have secondary structurepreserved.

FIG. 8. Desirable Traits for Functional Target Sequences. (A) ananalysis of nucleotide prevalence at each position of functionalsequences identified nucleotide preferences that could be incorporatedinto the miR-196a-2 algorithm. Y axis represents differential preferencefor nucleotides. X axis represents each nucleotide position. (B) a plotof total targeting sequence GC content vs functionality of all sequencesin the miR-196a-2 walk. Results show that sequences that contain ten orfewer Gs and Cs in the targeting sequence have a greater tendency toexhibit high performance than sequences with higher numbers of GCs.

FIG. 9. (A) distinction between targeting sequences chosen by the siRNArational design algorithm (U.S. patent application Ser. No. 10/940,892,filed Sep. 14, 2004, published as U.S. Pat. App. Pub. No. 2005/0255487)and the miR-196a-2 rational design algorithm. The sequence for the geneCDC2 was run through an siRNA design algorithm and the miR-196a-2scaffold algorithm. As shown, the two algorithms pick drasticallydifferent targeting sequences, thus emphasizing the unrelated nature ofthe two technologies. (B,C) target sequences for MAPK1 and EGFR thatwere inserted into the dual luciferase reporter constructs, (D) acomparison of the performance of sequences targeting EGFR and MAPK1designed with an siRNA algorithm (siRD) vs. sequences designed with themiR-196a-2 algorithm (shRD). Both sets of sequences were cloned into theartificial miR-196a-2 backbone and tested for the ability to knockdownthe target gene (hRluc) in the dual luciferase assay. Rationallydesigned siRNA sequences were 1) converted into shRNAs, and 2) clonedinto the miR-196a-2 backbone, were also run for comparison. Note:secondary structure that matched that of the natural miR-196a-2 wasincluded in the siRNA-to-shRNA design. Performance was measured usingthe dual luciferase reporter assay. (E) Performance of new miR-196a-2algorithm in targeting additional genes including CDC2, CD28, CD69, andLAT. (F) Sequence of Zap70 target sequence inserted into the 3′ UTRhRluc multiple cloning site for the dual luciferase reporter construct,(G) Sequences of four non-functional inserts targeting the Zap70 geneidentified a prevalence of GCs in the seed region of the mature strand.Dashed line represents the mature strand sequence (5′→3′), boldunderline represents the position of the mature strand seed, solid boxline represents GC runs in each sequence. (H) Nearest Neighbor analysisperformed on the collection of functional sequences taken from themiR-196a-2 walk shows that as a collection, highly functional sequenceshave low GC content in the region of the mature strand seed. X-axisrepresents the position along the mature strand, Y axis represents thedifferential free-energy preference for functional sequences.

DETAILED DESCRIPTION

The term “artificial intron” refers to a specific sequence that has beendesigned to act as an intron (i.e., it has essential splice donor andacceptor sequences and other relevant properties) and has minimalsecondary structure.

The term “rational design” refers to the application of a proven set ofcriteria that enhance the probability of identifying a sequence thatwill provide highly functional levels of gene silencing.

The term “reporter” or “reporter gene” refers to a gene whose expressioncan be monitored. For example, expression levels of a reporter can beassessed to evaluate the success of gene silencing by substrates of theRNAi pathway.

The term “RNA Induced Silencing Complex,” and its acronym “RISC,” refersto the set of proteins that complex with single-stranded polynucleotidessuch as mature miRNA or siRNA, to target nucleic acid molecules (e.g.,mRNA) for cleavage, translation attenuation, methylation, and/or otheralterations. Known, non-limiting components of RISC include Dicer, R2D2and the Argonaute family of proteins, as well as strands of siRNAs andmiRNAs.

The term “RNA interference” and the term “RNAi” are synonymous and referto the process by which a polynucleotide (a miRNA or siRNA) comprisingat least one polyribonucleotide unit exerts an effect on a biologicalprocess. The process includes, but is not limited to, gene silencing bydegrading mRNA, attenuating translation, interactions with tRNA, rRNA,hnRNA, cDNA and genomic DNA, as well as methylation of DNA withancillary proteins.

The term “gene silencing” refers to a process by which the expression ofa specific gene product is lessened or attenuated by RNA interference.The level of gene silencing (also sometimes referred to as the degree of“knockdown”) can be measured by a variety of means, including, but notlimited to, measurement of transcript levels by Northern Blot Analysis,B-DNA techniques, transcription-sensitive reporter constructs,expression profiling (e.g. DNA chips), qRT-PCR and related technologies.Alternatively, the level of silencing can be measured by assessing thelevel of the protein encoded by a specific gene. This can beaccomplished by performing a number of studies including WesternAnalysis, measuring the levels of expression of a reporter protein thathas e.g. fluorescent properties (e.g., GFP) or enzymatic activity (e.g.alkaline phosphatases), or several other procedures.

The terms “microRNA”, “miRNA”, or “miR” all refer to non-coding RNAs(and also, as the context will indicate, to DNA sequences that encodesuch RNAs) that are capable of entering the RNAi pathway and regulatinggene expression. “Primary miRNA” or “pri-miRNA” represents thenon-coding transcript prior to Drosha processing and includes thestem-loop structure(s) as well as flanking 5′ and 3′ sequences.“Precursor miRNAs” or “pre-miRNA” represents the non-coding transcriptafter Drosha processing of the pri-miRNA. The term “mature miRNA” canrefer to the double stranded product resulting from Dicer processing ofpre-miRNA or the single stranded product that is introduced into RISCfollowing Dicer processing. In some cases, only a single strand of anmiRNA enters the RNAi pathway. In other cases, two strands of a miRNAare capable of entering the RNAi pathway.

The term “mature strand” refers to the sequence in an endogenous miRNAor in a non-naturally occurring miRNA that is the full or partialreverse complement (RC) of (i.e., is fully or partially complementaryto) a target RNA of interest. The terms “mature sequence,” “targetingstrand,” “targeting sequence” and “guide strand” are synonymous with theterm “mature strand” and are often used interchangeably herein.

The term “star strand” refers to the strand that is fully complementaryor partially complementary to the mature strand in a miRNA. The terms“passenger strand” and “star strand” are interchangeable.

The term “target sequence” refers to a sequence in a target RNA, or DNAthat is partially or fully complementary to the mature strand. Thetarget sequence can be described using the four bases of DNA (A, T, G,and C), or the four bases of RNA (A, U, G, and C). Target sequences canbe determined randomly, or, more preferably, target sequences can beidentified using an algorithm that identifies preferred target sequencesbased on one or more desired traits.

The term “target RNA” refers to a specific RNA that is targeted by theRNAi pathway, resulting in a decrease in the functional activity of theRNA. In some cases, the RNA target is a mRNA whose functional activityis its ability to be translated. In such cases, the RNAi pathway willdecrease the functional activity of the mRNA by translationalattenuation or by cleavage. In the instant disclosure, target RNAs aretargeted by non-naturally occurring miRNAs. The term “target” can alsorefer to DNA.

The term “endogenous miRNA” refers to a miRNA produced in an organismthrough transcription of sequences that naturally are present in thegenome of that organism. Endogenous miRNA can be localized in, forexample introns, open reading frames (ORFs), 5′ or 3′ untranslatedregions (UTRs), or intergenic regions. The organism which produces anendogenous miRNA may be, without limitation, human (and other primates),mouse, rat, fly, worms, fish or other organisms that have an intact RNAipathway.

The term “complementary” refers to the liability of polynucleotides toform base pairs with one another. Base pairs are typically formed byhydrogen bonds between nucleotide units in antiparallel polynucleotidestrands. Complementary polynucleotide strands can base pair in theWatson-Crick manner (e.g., A to T, A to U, C to G), or in any othermanner that allows for the formation of duplexes, including the wobblebase pair formed between U and G. As persons skilled in the art areaware, when using RNA as opposed to DNA, uracil rather than thymine isthe base that is considered to be complementary to adenosine. However,when a U is denoted in the context of the present invention, the abilityto substitute a T is implied, unless otherwise stated.

Perfect complementarity or 100% complementarity refers to the situationin which each nucleotide unit of one polynucleotide strand can hydrogenbond with a nucleotide unit of a second polynucleotide strand. Partialcomplementarity refers to the situation in which some, but not all,nucleotide units of two strands can hydrogen bond with each other. Forexample, two strands are at least partially complementary when at least6-7 base pairs can be formed over a stretch of about 19-25 nucleotides.Sequences are said to be “complementary” to one another when eachsequence is the (partial or complete) reverse complement (RC) of theother. For example, the sequence 5′ GATC 3′ is perfectly complementaryto its reverse complement sequence 3′ CTAG 5′. Sequences can also havewobble base pairing.

The term “duplex” refers to a double stranded structure formed by twocomplementary or substantially complementary polynucleotides that formbase pairs with one another, including Watson-Crick base pairs and U-Gwobble pairs, which allows for a stabilized double stranded structurebetween polynucleotide strands that are at least partiallycomplementary. The strands of a duplex need not be perfectlycomplementary for a duplex to form i.e. a duplex may include one or morebase mismatches.

A single polynucleotide molecule can possess antiparallel andcomplementary polynucleotide strands capable of forming a duplex withintramolecular base pairs. Such polynucleotides frequently have astem-loop structure where the strands of the stem are separated by aloop sequence (which is predominantly single stranded) and are thus ableto adopt a mutually antiparallel orientation. Stem-loop structures arewell known in the art. Pre-miRNAs and pri-miRNAs often have one or morestem-loop structures in which the stem includes a mature strand-starstrand duplex.

The term “nucleotide” refers to a ribonucleotide or adeoxyribonucleotide or modified form thereof, as well as an analogthereof. Nucleotides include species that comprise purines, e.g.,adenine, hypoxanthine, guanine, and their derivatives and analogs, aswell as pyrimidines, e.g., cytosine, uracil, thymine, and theirderivatives and analogs. Nucleotide analogs include nucleotides havingmodifications in the chemical structure of the base, sugar and/orphosphate, including, but not limited to, 5-position pyrimidinemodifications, 8-position purine modifications, modifications atcytosine exocyclic amines, and substitution of 5-bromo-uracil; and2′-position sugar modifications, including but not limited to,sugar-modified ribonucleotides in which the 2′-OH is replaced by a groupsuch as an H, OR, R, halo, SH, SR, NH₂, NHR, NR₂, or CN, wherein R is analkyl moiety. Nucleotide analogs are also meant to include nucleotideswith bases such as inosine, queuosine, xanthine, sugars such as2′-methyl ribose, non-natural phosphodiester linkages such asinethylphosphonates, phosphorothioates and peptides.

Modified bases refer to nucleotide bases such as, for example, adenine,guanine, cytosine, thymine, uracil, xanthine, inosine, and queuosinethat have been modified by the replacement or addition of one or moreatoms or groups. Some examples of types of modifications that cancomprise nucleotides that are modified with respect to the base moietiesinclude but are not limited to, alkylated, halogenated, thiolated,aminated, amidated, or acetylated bases, individually or in combination.More specific examples include, for example, 5-propynyluridine,5-propynylcytidine, 6-methyladenine, 6-methylguanine,N,N,-dimethyladenine, 2-propyladenine, 2-propylguanine, 2-aminoadenine,1-methylinosine, 3-methyluridine, 5-methylcytidine, 5-methyluridine andother nucleotides having a modification at the 5 position,5-(2-amino)propyl uridine, 5-halocytidine, 5-halouridine,4-acetylcytidine, 1-methyladenosine, 2-methyladenosine,3-methylcytidine, 6-methyluridine, 2-methylguanosine, 7-methylguanosine,2,2-dimethylguanosine, 5-methylaminoethyluridine, 5-methyloxyuridine,deazanucleotides such as 7-deaza-adenosine, 6-azouridine, 6-azocytidine,6-azothymidine, 5-methyl-2-thiouridine, other thio bases such as2-thiouridine and 4-thiouridine and 2-thiocytidine, dihydrouridine,pseudouridine, queuosine, archaeosine, naphthyl and substituted naphthylgroups, any O- and N-alkylated purines and pyrimidines such asN6-methyladenosine, 5-methylcarbonylmethyluridine, uridine 5-oxyaceticacid, pyridine-4-one, pyridine-2-one, phenyl and modified phenyl groupssuch as aminophenol or 2,4,6-trimethoxy benzene, modified cytosines thatact as G-clamp nucleotides, 8-substituted adenines and guanines,5-substituted uracils and thymines, azapyrimidines, carboxyhydroxyalkylnucleotides, carboxyalkylamino nucleotides, and alkylcarbonylalkylatednucleotides. Modified nucleotides also include those nucleotides thatare modified with respect to the sugar moiety, as well as nucleotideshaving sugars or analogs thereof that are not ribosyl. For example, thesugar moieties may be, or be based on, mannoses, arabinoses,glucopyranoses, galactopyranoses, 4′-thioribose, and other sugars,heterocycles, or carbocycles.

The term nucleotide is also meant to include what are known in the artas universal bases. By way of example, universal bases include, but arenot limited to, 3-nitropyrrole, 5-nitroindole, or nebularine. The term“nucleotide” is also meant to include the N3′ to P5′ phosphoramidate,resulting from the substitution of a ribosyl 3′-oxygen with an aminegroup. Further, the term nucleotide also includes those species thathave a detectable label, such as for example a radioactive orfluorescent moiety, or mass label attached to the nucleotide.

The term “polynucleotide” refers to polymers of two or more nucleotides,and includes, but is not limited to, DNA, RNA, DNA/RNA hybrids includingpolynucleotide chains of regularly and/or irregularly alternatingdeoxyribosyl moieties and ribosyl moieties (i.e., wherein alternatenucleotide units have an —OH, then and —H, then an —OH, then an —H, andso on at the 2′ position of a sugar moiety), and modifications of thesekinds of polynucleotides, wherein the attachment of various entities ormoieties to the nucleotide units at any position are included.

The term “ribonucleotide” and the term “ribonucleic acid” (RNA), referto a modified or unmodified nucleotide or polynucleotide comprising atleast one ribonucleotide unit. A ribonucleotide unit comprises anhydroxyl group attached to the 2′ position of a ribosyl moiety that hasa nitrogenous base attached in N-glycosidic linkage at the 1′ positionof a ribosyl moiety, and a moiety that either allows for linkage toanother nucleotide or precludes linkage.

In one aspect, the present disclosure provides non-naturally occurringmiRNAs (also sometimes referred to herein as “artificial miRNAs”) thatare capable of reducing the functional activity of a target RNA. By“non-naturally occurring miRNA” (where miRNA in this context refers to aspecific endogenous miRNA) is meant a pre-miRNA or pri-miRNA comprisinga stem-loop structure(s) derived from a specific endogenous miRNA inwhich the stem(s) of the stem-loop structure(s) incorporates a maturestrand-star strand duplex where the mature strand sequence is distinctfrom the endogenous mature strand sequence of the specific referencedendogenous miRNA. The sequence of the star strand of a non-naturallyoccurring miRNA of the disclosure is also distinct from the endogenousstar strand sequence of the specific referenced endogenous miRNA.

The sequences of a non-naturally occurring miRNA outside of the maturestrand-star strand duplex (i.e., the loop and the regions of the stem oneither side of the mature strand-star strand duplex, and optionallyincluding flanking sequences, as detailed below) are referred to hereinas “miRNA scaffold,” “scaffold portion,” or simply “scaffold.” Thus, inanother aspect, the disclosure provides miRNA scaffolds useful for thegeneration of non-naturally occurring miRNAs. A non-naturally occurringmiRNA of the disclosure comprises a miRNA scaffold derived from (i.e. atleast 60% identical to, up to and including 100% identical to) aspecific endogenous miRNA and further comprises a mature strand-starstrand duplex that is not derived from that same specific endogenousmiRNA. A single miRNA scaffold of the disclosure can be used to providean almost unlimited number of different non-naturally occurring miRNAs,each having the same miRNA scaffold sequence but different mature strandand star strand sequences.

Note that one skilled in the art will appreciate that the term “anon-naturally occurring miRNA” may refer not only to a RNA molecule, butalso in certain contexts to a DNA molecule that encodes such an RNAmolecule.

Endogenous miRNAs from which the miRNA scaffold sequences of thedisclosure are derived include, but are not limited to, miR-26b,miR-196a-2, and miR-204, from humans (miRNA Accession numbersMIMAT0000083, MIMAT0000226, MIMAT0000265 respectively available athttp://microrna.sanger.ac.uk/sequences/index.shtml), as well as miR-26b,miR-196a-2, and miR-204 from other species. In this context, two miRNAsare judged to be equivalent if the mature strand of each sequence isidentical or nearly identical. Hence, the term “a non-naturallyoccurring miR-196-a-2 miRNA” refers to a pre-miRNA or pri-miRNAcomprising a miR-196a-2 miRNA scaffold (i.e. a miRNA scaffold derivedfrom miR-196a-2 or an equivalent sequence) and further comprising amature strand and star strand sequence that is distinct from theendogenous mature strand and star strand sequences of endogenousmiR-196a-2. A non-naturally occurring miR-196a-2 miRNA thus comprises astem-loop structure(s) derived from miR-196a-2 (from any species) inwhich the stem(s) of the stem-loop structure(s) incorporates a maturestrand-star strand duplex where the mature strand sequence is distinctfrom the endogenous mature strand sequence of miR-196a-2. Similarly, theterm “a non-naturally occurring miR-204 miRNA” refers to a pre-miRNA orpri-miRNA comprising a miR-204 miRNA scaffold (i.e. a miRNA scaffoldderived from miR-204) and a mature strand and star strand sequence thatis not derived from miR-204. A non-naturally occurring miR-204 miRNAthus comprises a stem-loop structure(s) derived from miR-204 (from anyspecies) in which the stem(s) of the stem-loop structure(s) incorporatesa mature strand-star strand duplex where the mature strand sequence isdistinct from the endogenous mature strand sequence of miR-204.

The miRNA scaffold sequence may be the same as the specificallyreferenced endogenous miRNA (e.g., miR-196a-2 or miR-204), or it may bedifferent from the specifically referenced endogenous miRNA by virtue ofthe addition, substitution, or deletion of one or more nucleotidesrelative to the endogenous miRNA sequence. Such modifications canenhance the functionality of the miRNA scaffold by, for example,introducing restriction sites. Restriction sites can facilitate cloningstrategies e.g. by allowing the introduction of mature strand and starstrand sequences into the miRNA scaffold, and by allowing introductionof the non-naturally occurring miRNA into a vector construct so that itmay be expressed in a cell. In addition, modifications in the miRNAscaffold may be made in order to minimize the functionality of the starstrand of a non-naturally occurring miRNA in the RNAi machinery. Inaddition, nucleotide changes can be made in the miRNA scaffold tominimize the length of the mature strand and the star strand, and yetstill yield efficient and specific gene silencing activity. Sequencemodifications can also be made to the miRNA scaffold in order tominimize the ability of the star strand in the resulting non-naturallyoccurring miRNA to interact with RISC. In still another example, thenumber of nucleotides present in loop of the miRNA scaffold can bereduced to improve manufacturing efficiency.

The miRNA scaffold may also include additional 5′ and/or 3′ flankingsequences (for example, where it is desired to provide non-naturallyoccurring miRNA as a pri-miRNA that is first processed by Drosha toyield a pre-miRNA). Such flanking sequences flank the 5′ and/or 3′ endsof the stem-loop and range from about 5 nucleotides in length to about600 nucleotides in length, preferably from about 5 nucleotides to about150 nucleotides in length. The flanking sequences may be the same as theendogenous sequences that flank the 5′ end and/or the 3′ of thestem-loop structure of endogenous miRNA from which the miRNA scaffold isderived or they may be different by virtue of the addition, deletion, orsubstitution of one or more base pairs. For example, a miR-196a-2 miRNAscaffold (and a non-naturally occurring miR-196a-2 miRNA obtained bycloning a mature strand sequence and a star strand sequence thereinto)may include 5′ and/or 3′ flanking sequence which is the same as theendogenous sequences that flank the 5′ end and/or the 3′ of thestem-loop structure of endogenous miR-196a-2 miRNA.

The 5′ and 3′ flanking sequences can also be derived from the endogenoussequences that flank the 5′ end and/or the 3′ of the stem-loop structureof an endogenous miRNA other than the specifically referenced miRNA. Forexample, in some embodiments a miR-196a-2 miRNA scaffold includes 5′and/or 3′ flanking sequences that are derived from the endogenoussequences that flank the 5′ end and/or the 3′ end of the stem-loopstructure of another miRNA, such as miR-204. In other examples, the 5′and/or 3′ flanking sequences may be artificial sequences designed ordemonstrated to have minimal effects on miRNA folding or processing orfunctionality. In other examples, the 5′ and/or 3′ flanking sequencesare natural sequences that enhance or do not interfere with the foldingor processing of the non-naturally occurring miRNA by Drosha, Dicer, orother components of the RNAi pathway. In addition, flanking sequencescan be designed or selected to have one or more nucleotide motifs and/orsecondary structures that enhance processing of the non-naturallyoccurring miRNA to generate the mature miRNA. Thus for instance, if thenon-naturally occurring miRNA is intended to be located within an intronfor expression purposes, the flanking sequences in the miRNA scaffoldcan be modified to contain, for instance, splice donor and acceptorsites that enhance excision of the non-naturally occurring miRNA fromthe expressed gene. Alternatively, if a unique sequence or sequences areidentified that enhance miRNA processing, such sequences can be insertedinto the 5′ and/or 3′ flanking sequences. Such sequences might includeAU-rich sequences, and sequences that have affinity with one or morecomponents of the RNAi machinery, and sequences that form secondarystructures that enhance processing by the RNAi machinery. In oneembodiment, the flanking sequence comprises the artificial intronsequence in FIG. 4.

In still other embodiments, the 5′ and/or 3′ flanking sequences in themiRNA scaffold may be derived from a different species than the otherportions of the miRNA scaffold. For example, the 5′ and/or 3′ flankingsequences in a miR-196a-2 miRNA scaffold may be derived from theflanking regions of rat miR-196a-2 (or indeed from the flanking regionsof another rat miR) whereas the remainder of the miR-196a-2 miRNAscaffold is derived from human miR-196a-2.

In some embodiments, the sequence of the mature strand of anon-naturally occurring miRNA is the same as, or is derived from, thesequence of the mature strand of a miRNA distinct from the miRNA fromwhich the miRNA scaffold portion of the non-naturally occurring miRNA isderived. For example, a non-naturally occurring miR-196a-2 miRNA (whichhas a miRNA scaffold structure derived from miR-196a-2) may have themature strand of another endogenous miRNA such as miR-16 or miR-15. Inthese embodiments, the sequence of the mature strand may be modifiedrelative to the endogenous sequence in order to optimize the functionalactivity of the mature strand in the particular miRNA scaffold.

The miRNA scaffolds of the disclosure 1) contain well-defined stem andloop structures, 2) have minimal secondary structures, 3) are modifiableto facilitate cloning, 4) permit a non-naturally occurring miRNA to beexpressed from a Pol II or Pol III promoter, 5) are amenable to changesthat alter loop size and sequence, 6) permit a non-naturally occurringmiRNA to function when maintained epigenetically (i.e. as plasmids) orare inserted into the host genome, and 7) are amenable to insertion orsubstitution of foreign sequences at the position of the endogenousmature miRNA sequence in order to generate a non-naturally occurringmiRNA. In cases where the scaffold is associated with e.g. a reportergene (such as GFP) or selectable marker gene (such as puromycin), orboth, preferred miRNA scaffolds can perform regardless of whether theyare inserted in the 5′ UTR, 3′ UTR, intronic sequences or ORF of saidgenes. In one preferred configuration, a fusion construct comprising thegene encoding GFP is functionally fused to a gene encoding puromycinwith the sequence encoding Peptide 2A functionally separating the twocoding sequences, and the artificial miRNA-196a-2 inserted in the 3′ UTRof the fusion construct. FIG. 6F and FIG. 6G illustrate non-limitingexamples of miR-204 and miR-196a-2 scaffolds, respectively, in which thesite of mature strand and the star strand insertion is depictedschematically. Nucleotide substitutions (relative to endogenous miR-204and miR-196a-2) are indicated by use of upper case format; suchsubstitutions introduce restriction sites (indicated) to facilitatecloning.

As disclosed above, a non-naturally occurring miRNA of the disclosurecomprises a miRNA scaffold derived from a specific endogenous miRNA andfurther comprises a mature strand-star strand duplex that is not derivedfrom that same specific endogenous miRNA. The mature strand of thenon-naturally occurring miRNAs of the disclosure can be the same length,longer, or shorter than the endogenous miRNA from which the scaffold isderived. The exact length of the mature strand of a non-naturallyoccurring miRNA of the disclosure is not important so long as theresulting non-naturally occurring miRNA is capable of being processed byDrosha and/or Dicer.

In some embodiments, the mature strand sequence inserted into the miRNAscaffolds of the present disclosure are randomly selected from thetarget RNA of interest e.g. the mature strand is the reverse-complementof a subsequence from the target RNA, where the subsequence from thetarget RNA is chosen randomly.

In one series of embodiments, the mature strand sequences inserted intothe miRNA scaffolds of the disclosure are rationally designed. Designingsequences for a miRNA scaffold includes two steps: identification ofpreferred target sites in the gene to be targeted, and optimizing thescaffold around the selected sequences to ensure structural elements arepreserved in the expressed molecule. Identifying target sites can beachieved by several methods. According to one embodiment, the disclosureprovides a method for identifying attributes that are 1) important forand/or 2) detrimental to functionality of a targeting sequence embeddedin a scaffold. The method comprises: (a) selecting a set ofrandomly-selected sequences targeting a gene (i.e. mature strandsequences that are at least partially complementary to a target RNA);(b) incorporating those sequences into the scaffold of choice, (c)determining the relative functionality of each sequence in the contextof the scaffold, (d) determining how the presence or absence of at leastone variable affect functionality, and (e) developing an algorithm forselecting functional sequences using the information of step (d).

Methods for detecting the efficiency of target knockdown (step (c)) bysequences include quantitating target gene mRNA and/or protein levels.For mRNA, standard techniques including PCR-based methods, northernblots, and branched DNA can be applied. For protein quantitation,methods based on ELISA, western blotting, and the like can be used toassess the functionality of sequences. One preferred protein detectionassay is based on a reporter system such as the dual-luciferase reportervector system (e.g. psiCheck, Promega) containing short target sequencesfor each targeting sequence that can be used to assess the functionalityof each sequence.

Side-by-side analysis of functional and non-functional sequences canidentify positions or regions where particular nucleotides,thermodynamic profiles, secondary structures, and more, enhance ornegatively affect functionality. By merging these elements (bothpositive and negative) in a weighted fashion, a selection algorithm canbe assembled.

In one embodiment, the present disclosure provides a method foridentifying functional target sites for the miR-196a-2 scaffold. Themethod comprises applying selection criteria (identified bybioinformatic analysis of functional and non-functional sets ofsequences) to a set of potential sequences that comprise about 18-23base pairs (although longer or shorter sequences are also specificallycontemplated), where the selection criteria are non-target specificcriteria and species independent. Preferred selection criteria includeboth positively and negatively weighted elements associated with 1)nucleotides at particular positions, 2) regiospecific thermodynamicprofiles at particular positions, 3) elimination or incorporation ofpossible secondary structures within the targeting sequence, and otherfactors. Application of one or more of these selection criteria allowrational design of sequences to be inserted into the miR-196a-2scaffold.

In one embodiment, the selection criteria are embodied in a formula. Forexample, formula I provided below may be used to determine nucleotides1-19 (numbered in the 5′ to 3′ direction) of the mature strand (whichmay be a 19 nucleotide to, for example, a 25 nucleotide mature strand)of highly functional non-naturally occurring miR-196a-2 gene targetingsequences.

Formula I: for nucleotides 1-19 of the reverse complement of the targetsequence.

Score=(−500)*A1+(43.8)*T1+(−21.3)*C1+(−500)*G1+(21.3)*T5+(18.8)*A6+(−3)*T6±(25)*A7+(−41.3)*G7+(21.3)*T8+(−16.3)*C8+(37.5)*T12+(−18.8)*G12+(27.5)*T13+(−22.5)*C13+(21.3)*T15+(−17.5)*G15+(−18.8)*G16+(−18.8)*G17+(16.3)*T18+(−17.5)*G18+(21.3)*T19+(28.8)*C19+(−35)*G19

where “A” represents an adenine, “G” represents a guanine, “T”represents a thymine, and “C” represents a cytosine. In addition, thenumber following the symbol for each base (e.g. A1) refers to theposition of the base. In the reverse complement of the target mRNA. Assuch, the reverse complement (RC) nucleotide 1 in the algorithm is thecomplement of nucleotide 19 in the target mRNA (see FIG. 2).Furthermore, nucleotide 19 of the target mRNA base pairs or wobble pairswith nucleotide 1 of the mature strand which is inserted into the miRNAscaffold. Table 1 below indicates the aligned nucleotide positions,where M₁-M₁₉ are nucleotides 1-19 of the mature strand; R₁-R₁₉ arenucleotides 1-19 of the target RNA, and nucleotides S₁-S₁₉ arenucleotides 1-19 of the star strand:

TABLE 1 3′ S₁₉ S₁₈ S₁₇ S₁₆ S₁₅ S₁₄ S₁₃ S₁₂ S₁₁ S₁₀ S₉ S₈ S₇ S₆ S₅ S₄ S₃S₂ S₁ 5′ 5′ M₁ M₂ M₃ M₄ M₅ M₆ M₇ M₈ M₉ M₁₀ M₁₁ M₁₂ M₁₃ M₁₄ M₁₅ M₁₆ M₁₇M₁₈ M₁₉ 3′ 3′ R₁₉ R₁₈ R₁₇ R₁₆ R₁₅ R₁₄ R₁₃ R₁₂ R₁₁ R₁₀ R₉ R₈ R₇ R₆ R₅ R₄R₃ R₂ R₁ 5′

Detailed studies of functional and non-functional sequences identified apreference for a “U” at position 1 of the mature strand of non-naturallyoccurring miR-196a-2 miRNAs. Therefore, a U at position 1 of the maturestrand is highly desirable. Taking this into account, an “A” or “G” atposition 1 of the reverse complement of the target is highly negativelyweighted (−500). A “C” at position 1 is also selected against (−21.3)albeit the weighting is less severe because a “C” at this position stillallows a GU wobble to occur in the mature-target duplex. In contrast, a“T” at position 1 of the RC of the target is highly desirable (+43.8).(Note: “U” refers to the nucleotide as it appears in the RNA molecule,“T” refers to the nucleotide as it appears in the cDNA of the RNAmolecule)

Note that there is evidence to support both a 21 nucleotide endogenousmature strand (with the 3′ terminus being GG) and a 22 nucleotideendogenous mature strand (with the 3′ terminus being GGG) formiR-196a-2. If the mature strand of a non-naturally occurring miR-196a-2is longer than 19 nucleotides (with additional nucleotides added to the3′ end), then the star strand will also include additional nucleotidesat its 5′ end such that the star strand and the mature strand are thesame length. For example, if the mature strand is 21 nucleotides long,then the star strand will be 21 nucleotides in length also, with twoextra bases appearing 5′ of S₁ in the alignment above. In embodimentswhere the algorithm of Formula I is used and where the mature strand isa 21 nucleotide sequence, bases 2-19 of the mature strand (nucleotide 1of the mature is preferably a U) are determined by the algorithm ofFormula I, and bases 20 and 21 may be (but need not be) Gs to mimic theendogenous miR-196a-2 mature strand sequence. If bases 20-21 of themature strand are GG, then bases at the opposite position on the starstrand can be CC, UU, UC (as in the endogenous mature strand-star strandsequence), or CU (thus forming a base pair, either Watson-Crick orwobble). Alternatively, positions 20 and 21 can be GG and thesenucleotides can be mismatched with nucleotides at opposing positions inthe star strand (e.g. G-G mismatches or G-A mismatches). Alternativelypositions 20 and 21 can consist of sequences that base pair with thetarget RNA. In this case the nucleotides on the opposing star strand cangenerate Watson-Crick pairings, wobble pairings, or mismatches.

Similarly, if the mature strand is 22 nucleotides long (as noted above,there is evidence to support the existence of both 21 nucleotide and 22nucleotide endogenous mature strands for miR-196a-2), then the starstrand will be 22 nucleotides in length also, with three extra basesappearing 5′ of S₁ in the alignment above. In embodiments where thealgorithm of Formula I is used and where the mature strand is a 22nucleotide sequence, bases 2-19 of the mature strand (nucleotide 1 ofthe mature is preferably a U) are determined by the algorithm of FormulaI, and bases 20, 21, and 22 may be (but need not be) Gs to mimic theendogenous miR-196a-2 mature strand sequence. If bases 20-22 of themature strand are GGG, then bases at the opposite position on the starstrand can be form either Watson-Crick base pairs or wobble pairs. Forexample, if bases 20-22 of the mature strand are GGG, then the starstrand sequence opposite this sequence could be CUC which mimics theendogenous mature strand-star strand duplex at this position.Alternatively, positions 20, 21, and 22 can be GGG and these nucleotidescan be mismatched with nucleotides at opposing positions in the starstrand (e.g. G-G mismatches or G-A mismatches). Alternatively positions20-22 can consist of sequences that base pair with the target RNA. Inthis case the nucleotides on the opposing star strand can generateWatson-Crick pairings, wobble pairings, or mismatches.

Formula I refers to the reverse complement of the target sequencenucleotide position preferences. As such, Formula I is applied, forexample, by: (1) determining the sequence that is the reverse complementof a target RNA; and (2) applying the algorithm to this sequence toidentify the 19 nucleotide sub-sequence(s) with a desirable score in thealgorithm (e.g. with the highest, or one of the highest scores relativeto other sub-sequences). The identified sequences are then introducedinto a miR-196a-2 miRNA scaffold to yield non-naturally occurringmiR-196a-2 miRNAs.

Formula I can also be expressed as a series of criteria, where eachcriterion represents the rank order preference for a base of the RC ofthe target sequence:

criterion 1: at position 1 of the RC of the target sequence, T isfavored over C, and G and A are each disfavoredcriterion 2: at position 5 of the RC of the target sequence, T isfavored over each of G, C, and Acriterion 3: at position 6 of the RC of the target sequence, A isfavored over each of G and C; and each of G and C is favored over Tcriterion 4: at position 7 of the RC of the target sequence, A isfavored over each of C and T; and each of C and T is favored over Gcriterion 5: at position 8 of the RC of the target sequence, T isfavored over each of A and G; and each of A and G is favored over Ccriterion 6: at position 12 of the RC of the target sequence, T isfavored over each of A and C; and each of A and C is favored over Gcriteria 7: at position 13 of the RC of the target sequence, T isfavored over each of A and G; and each of A and G is favored over Ccriterion 8: at position 15 of the RC of the target sequence, T isfavored over each of A and C; and each of A and C is favored over Gcriterion 9: at position 16 of the RC of the target sequence, each of A,C, and T is favored over Gcriterion 10: at position 17 of the RC of the target sequence, each ofA, C, and T is favored over Gcriterion 11: at position 18 of the RC of the target sequence, T isfavored over each of A and C; and each of A and C is favored over Gcriterion 12: at position 19 of the RC of the target sequence, C isfavored over T; T is favored over A; A is favored over G

In some embodiments, one or more (or all) of the criteria are applied toidentify mature strand sequences. For example, the criteria are appliedby (1) determining the sequence that is the reverse complement of atarget RNA; and (2) applying one or more of the criteria to identify a19 nucleotide sub-sequence(s). The identified sequences are thenintroduced into a miR-196a-2 miRNA scaffold to yield non-naturallyoccurring miR-196a-2 miRNAs. In preferred embodiments, the matureposition 1 is a T/U.

One skilled in the art will appreciate that Formula I can also beequivalently expressed so that it refers directly to target RNAnucleotide preferences (R₁-R₁₉ in table 1). This is done simply byreplacing each nucleotide preference in Formula I with the oppositecomplementary nucleotide in the target RNA (see table 1). Once adesirable target RNA sequence is identified, its reverse complement(preferably with a T/U at position 1) is introduced into an miR-196a-2miRNA scaffold where it forms the mature strand. Therefore if theoriginal version of Formula I referred to a “G” at M₂, then thereformulated version would refer to a “C” at R₁₈.

Similarly, when Formula I is used to describe the target site, it canalso be expressed as a series of criteria, where each criterionrepresents the rank order preference for a base of the target RNA (i.e.bases R₁-R₁₉ in the table above):

criterion 1: at R₁, G is favored over A, A is favored over U, and U isfavored over Ccriterion 2: at R₂, A is favored over each of U and G, and each of U andG is favored over C;criterion 3: at R₃, each of U, G, and A is favored over Ccriterion 4: at R₄, each of U, G, and A is favored over Ccriterion 5: at R₅, A is favored over each of U and G, and each of U andG is favored over Ccriterion 6: at R₇, A is favored over each of U and C, and each of U andC is favored over Gcriterion 7: at R₈, A is favored over each of U and G, and each of U andG is favored over Ccriterion 8: at R₁₂, A is favored over each of U and C, and each of Uand C is favored over Gcriterion 9: at R₁₃, U is favored over each of G and A, and each of Gand A is favored over Ccriterion 10: at R₁₄, U is favored over each of C and G, and each of Cand G is favored over A;criterion 11: at R₁₅, A is favored over each of C, G, and U;criterion 12: at R₁₉, A is favored over G, and each of C and U aredisfavored.

One or more (or all) of the criteria may be applied to determine adesirable target RNA sequence. For example, one or more of the criteriaare applied to a target RNA sequence to identify a 19 nucleotidesub-sequence(s); the reverse complement of the identified sub-sequence(preferably with a “T” at position 1) is then introduced into amiR-196a-2 miRNA scaffold as a mature strand to yield non-naturallyoccurring miR-196a-2 miRNA. In preferred embodiments, at least criterion12 is selected such that R₁₉ is A.

In another embodiment, the disclosure provides another algorithm fordetermining bases 1-21 (numbered in the 5′ to 3′ direction) of themature strand (which may be a 21-25 nucleotide mature strand) of ahighly functional non-naturally occurring miR-196a-2 miRNA, see FormulaII below:

Formula II: for nucleotides 1-21 of a reverse complement of a targetsequence.

Score=(−500)*A1+(43.75)*T1+(−21.25)*C1+(−500)*G1+(−36.7)*C3+(−33.3)*A5+(50)*T5+(−46.7)*C5+(37.5)*A6+(25)*A7+(29.2)*T7+(−41.25)*G7+(21.25)*T8+(−16.25)*C8+(45.8)*T12+(−18.75)*G12+(58.3)*T13+(−37.5)*C13+(−36.7)*C14+(21.25)*T15+(−17.5)*G15+(−36.7)*C16+(−18.75)*G16+(40)*T17+(−18.75)*G17+(16.25)*T18+(−17.5)*G18+(−33.3)*A19+(21.25)*T19+(28.75)*C19+(−35)*G19+(−23.3)*C20+(50)*T21

where “A” represents an adenine, “G” represents a guanine, “T”represents a thymine, and “C” represents a cytosine. In addition, thenumber following the symbol for each base (e.g. A1) refers to theposition of the base in the reverse complement of the target mRNA. Assuch the reverse complement nucleotide 1 in the algorithm is thecomplement of nucleotide 21 in the target mRNA (see FIG. 2). Table 2below indicates the aligned nucleotide positions, where M₁-M₂₁ arenucleotides 1-21 of the mature strand; R₁-R₂₁ are nucleotides 1-21 ofthe target RNA, and nucleotides S₁-S₂₁ are nucleotides 1-21 of the starstrand

TABLE 2 3′ S₂₁ S₂₀ S₁₉ S₁₈ S₁₇ S₁₆ S₁₅ S₁₄ S₁₃ S₁₂ S₁₁ 5′ 5′ M₁ M₂ M₃ M₄M₅ M₆ M₇ M₈ M₉ M₁₀ M₁₁ 3′ 3′ R₂₁ R₂₀ R₁₉ R₁₈ R₁₇ R₁₆ R₁₅ R₁₄ R₁₃ R₁₂ R₁₁5′ 3′ S₁₀ S₉ S₈ S₇ S₆ S₅ S₄ S₃ S₂ S₁ 5′ 5′ M₁₂ M₁₃ M₁₄ M₁₅ M₁₆ M₁₇ M₁₈M₁₉ M₂₀ M₂₁ 3′ 3′ R₁₀ R₉ R₈ R₇ R₆ R₅ R₄ R₃ R₂ R₁ 5′

Note that if the mature strand is longer than 21 nucleotides (forexample, 22 or 23 nucleotide in length), then the star strand will alsoinclude additional nucleotides at its 5′ end such that the star strandand the mature strand are the same length. For example, if the maturestrand is 22 nucleotides long (which may be the length of the endogenousmiR-196a-2 mature strand), then the star strand will be 22 nucleotidesin length also, with one extra bases appearing 5′ of S₁ in the alignmentabove. In embodiments where the algorithm of Formula II is used andwhere the mature strand is a 22 nucleotide sequence, bases 2-21 of themature strand sequence (nucleotide 1 of the mature is preferably a U)are determined by the algorithm of Formula II and bases 22 may be, forexample, G (which is the same nucleotide in the endogenous 22 nucleotidemature strand of miR-196a-2). If base 22 of the mature strand is G, thenthe base at the opposite position on the star strand can be C (as in theendogenous miR-196a-2) or U (thus forming a base pair, eitherWatson-Crick or wobble). Alternatively, position 22 can be G and can bemismatched with the nucleotide a the opposing position in the starstrand (e.g. a G-G mismatches or G-A mismatch). Alternatively position22 can be a nucleotide that base pairs with the target RNA. In this casethe nucleotide on the opposing star strand can generate Watson-Crickpairings, wobble pairings, or mismatches.

As with Formula I, Formula II refers to the reverse complement of thetarget sequence nucleotide position preferences. As such, Formula II isapplied, for example, by: (1) determining the sequence that is thereverse complement of a target RNA; and (2) applying the algorithm tothis sequence to identify the 21 nucleotide sub-sequence(s) with adesirable scores in the algorithm (e.g. with the highest, or one of thehighest scores relative to other sub-sequences). The identifiedsequences are then introduced into a miR-196a-2 miRNA scaffold to yieldnon-naturally occurring miR-196a-2 miRNAs.

Formula II can also be expressed as a series of criteria, where eachcriterion represents the rank order preference for a base of the reversecomplement of the target sequence e.g:

criterion 1: at position 1 of the reverse complement of the targetsequence, T>C, and each of A and G are disfavored

criterion 2: at position 3 of the reverse complement of the targetsequence, A,T,G>C

criterion 3: at position 5 of the reverse complement of the targetsequence, T>G>A>C

criterion 4: at position 6 of the reverse complement of the targetsequence, A>G,C,T

criterion 5: at position 7 of the reverse complement of the targetsequence, T>A>C>G

criterion 6: at position 8 of the reverse complement of the targetsequence, T>A,G>C

criterion 7: at position 12 of the reverse complement of the targetsequence, T>A,C>G

criterion 8: at position 13 of the reverse complement of the targetsequence, T>A,G>C

criterion 9: at position 14 of the reverse complement of the targetsequence, A,G,T>C

criterion 10: position 15 of the reverse complement of the targetsequence, T>A,C>G

criterion 11: at position 16 of the reverse complement of the targetsequence, A,T>G>C

criterion 12: at position 17 of the reverse complement of the targetsequence, T>A,C>G

criterion 13: at position 18 of the reverse complement of the targetsequence, T>A,C>G

criterion 14: at position 19 of the reverse complement of the targetsequence, C>T>A>G

criterion 15: at position 20 of the reverse complement of the targetsequence, A,G,T>C

criterion 16: at position 21 of the reverse complement of the targetsequence, T>A,G,C

where > indicates that one base is preferred over another e.g. W>X,Y>Zindicates that W is favored over each of X and Y; and each of X and Y isfavored over Z. In some embodiments, one or more (or all) of thecriteria are applied to identify mature strand sequences. For example,the criteria are applied by (1) determining the sequence that is thereverse complement of a target RNA; and (2) applying one or more of thecriteria to identify a 21 nucleotide sub-sequence(s). The identifiedsequences are then introduced into a miR-196a-2 miRNA scaffold to yieldnon-naturally occurring miR-196a-2 miRNAs. In preferred embodiments,position 1 of the mature strand is a “T”.

As with Formula I, Formula II can also be equivalently expressed so thatit refers directly to target RNA nucleotide preferences (R₁-R₂₁ in Table2). This is done simply by replacing each nucleotide preference inFormula II with the complementary nucleotide in the target RNA (seeTable 2).

Similarly, Formula II can also be expressed as a series of criteria,where each criterion represents the rank order preference for a base ofthe target RNA (i.e. bases R₁-R₂₁ in the table above):

criterion 1: at R₁, A is favored over each of U, C, and Gcriterion 2: at R₂, each of U, C, and A are favored over G;criterion 3: at R₃, G is favored over A, A is favored over U, and U isfavored over C;criterion 4: at R₄, A is favored over each of U and G, and each of U andG is favored over C;criterion 5: at R₅, A is favored over each of U and G, and each of U andG is favored over C;criterion 6: at R₆, each of U and A is favored over C, and C is favoredover G;criterion 7: at R₇, is A is favored over each of U and G, and each of Uand G is favored over C;criterion 8: at R₃, each of U, C, and A is favored over G;criterion 9: at R₉, A is favored over each of U and C, and each of U andC is favored over G;criterion 10: at R₁₀, A is favored over each of U and G, and each of Uand G is favored over C;criterion 11: at R₁₄, A is favored over each of U and C, and each of Uand C is favored over G;criterion 12: at R₁₅, A is favored over U, U is favored over G, and G isfavored over C;criterion 13: at R₁₆, U is favored over each of C, G, and A;criterion 14: at R₁₇, A is favored over C, C is favored over U, and U isfavored over G;criterion 15: at R₁₉, each of U, A, and C is favored over G;criterion 16: at R₂₁, A is preferred over G, and each of U and C aredisfavored

One or more (or all) of the criteria may be applied to determine adesirable target sequence. For example, the criteria are applied byapplying one or more of the criteria to a target RNA sequence toidentify a nucleotide sub-sequence(s); the reverse complement of theidentified sequence (preferably with a T/U at position 1) is thenintroduced into a miR-196a-2 miRNA scaffold as a mature strand to yieldnon-naturally occurring miR-196a-2 miRNA. Again, in preferredembodiments position 1 of the mature strand is a “T/U”.

Additional weighted elements that focus on regiospecific factors,particularly overall GC content, GC content in the seed region, and theappearance of tetranuclotides, can be added to further enhance thefunctionality of Formulas I, II, or derivatives thereof. For example,these include any of the following elements in the mature strand:

a. −3* (# GCs)

b. −100 IF AT LEAST 1 “AAAA”

c. −100 IF AT LEAST 1 “TTTT”

d. −100 IF AT LEAST 1 “GGGG”

e. −100 IF AT LEAST 1 “CCCC”

f. −100 IF >4 GCs IN 2-8

g. −100 IF >10 GCs

Where:

-   -   “# of GCs” refers to the number of G and C nucleotides in the        reverse complement of the target (or in the target RNA when        Formula I or II is expressed as target RNA preferences)    -   “AAAA” refers to a tetranucleotide containing all As in the        reverse complement of the target (equivalent to “UUUU” in the        target RNA when Formula I or II is expressed as target RNA        preferences)    -   “TTTT” refers to a tetranucleotide containing all Ts (equivalent        to “AAAA” in the target RNA when Formula I or II is expressed as        target RNA preferences)    -   “GGGG” refers to a tetranucleotide containing all Gs (equivalent        to “CCCC” in the target RNA when Formula I or II is expressed as        target RNA preferences)    -   “CCCC” refers to a tetranucleotide containing all Cs (equivalent        to “GGGG” in the target RNA when Formula I or II is expressed as        target RNA preferences)    -   “>4 GCs in 2-8 of mature” refers to more than four G and/or C        nucleotides anywhere in positions 2-8 of the mature strand (i.e.        the seed region), (equivalent to more than four G and/or C        nucleotides anywhere in positions R₁₄-R₂₀ when Formula II is        expressed as RNA target preferences, or in positions R₁₂-R₁₈        when Formula I is expressed as RNA target preferences).    -   “>10 GCs” refers to more than ten G and/or C nucleotides        anywhere in the reverse complement of the target (equivalent to        more than ten G and/or C nucleotides anywhere in the target RNA        when Formula I or II is expressed as RNA target preferences).

Similarly, when Formula I and Formula II are expressed as a series ofcriteria (as described above), these additional weighted elements mayalso be expressed as additional criteria. For example, when Formula I orFormula II is expressed as a series of criteria, the additional weightedelements may be expressed as the following additional criteria, one ormore (or all) of which may be applied to select desirable sequences:

-   -   A) the reverse complement of the target does not include a        tetranucleotide sequence selected from the group consisting of        AAAA, UUUU, GGGG, and CCCC (or, equivalently, the target RNA        subsequence does not include AAAA, UUUU, GGGG, or CCCC);    -   B) the reverse complement of the target has a total G+C content        of not more than 10 (or, equivalently, the target RNA        subsequences does not have a total G+C content of more than 10)    -   C) the mature strand has a G+C content of not more than 4 in the        seed region (or, equivalently, the bases of the target RNA        subsequence that are opposite the seed region of the mature        strand has a G+C content of not more than 4)

It should be noted that when describing the mature strand, thenucleotide U can be used to describe the RNA sequence, or the nucleotideT can be used to describe the cDNA sequence for the RNA.

Additional weighted factors that focus on eliminating target sequencesthat can have secondary structures (e.g. hairpins) can also be added toselection algorithms.

Furthermore, any of the methods of selecting sequences can furthercomprise selecting either for or against sequences that contain motifsthat induce cellular stress. Such motifs include, for example, toxicitymotifs (see US2005/0203043, published Sep. 15, 2005). Theabove-described algorithms may be used with or without a computerprogram that allows for the inputting of the sequence of the target andautomatically outputs the optimal targeting sequences. The computerprogram may, for example, be accessible from a local terminal orpersonal computer, over an internal network or over the Internet.

Furthermore, any of the methods of selecting sequences can furthercomprise selecting for or against targeting sequences that haveparticular seed region (positions 2-7 or 2-8 of the mature strand)sequences. In one non-limiting example, targeting sequences that haveseeds that show complete identity to one of the seeds of one or moreendogenously expressed microRNAs can be eliminated. In another example,seeds that have medium or high seed complement frequencies can beeliminated. Full descriptions of the importance of seeds having mediumor high seed complement frequencies can be found in U.S. Ser. No.11/724,346, filed Mar. 15, 2007.

Once optimal mature strand sequences have been obtained, they areintroduced in miR-196a-2 scaffolds as described above. The star strandis (for the most part) the reverse complement of the mature strand, butpreferably has some alterations to create local structure to mimic thestructure of the endogenous mature strand-star strand of endogenousmiR-196a-2. Star strand attributes, for example, may include one or moreof the following shown in FIG. 3 and described in detail below:

-   -   1. When position 1 of the mature strand is a U (which, as        discussed above, is preferable but not mandatory), the star        strand position opposite is preferably a G to ensure it will        always wobble pair    -   2. If position 5 of the mature strand is G or T (U), then the        star strand position opposite it is preferably altered to be        T(U) or G (respectively) to create a wobble pair.    -   3. If the mature strand has something other than G or T at        position 5, then the star strand position opposite is designed        to generate a standard Watson-Crick pair.    -   4. A mismatch is preferably created between position 12 of the        mature strand and the opposite position of the star strand. This        can be achieved by the relevant position of the star strand        having the same base as position 12 of the mature strand.    -   5. If the mature strand is 18 nucleotides or longer in length,        then same criteria that are applied to positions 5 of the mature        strand and the opposite position of the star strand are        similarly applied to positions 18 of the mature strand and the        opposite position of the star strand. Specifically, if position        18 of the mature strand is G or T (U), then the star strand        position opposite it is altered to be T(U) or G, respectively,        to create a wobble pair. If the mature strand has something        other than G or T at position 18, then the star strand position        opposite this position is designed to generate a standard        Watson-Crick pair    -   6. If the mature strand is 19 nucleotides or longer in length,        then the same criteria that are applied to positions 5 of the        mature strand and the opposite position of the star strand are        similarly applied to positions 19 of the mature strand and the        opposite position of the star strand. Specifically, if position        19 of the mature strand is G or T (U), then the star strand        position opposite it is altered to be T(U) or G (respectively)        to create a wobble pair. If the mature strand has something        other than G or T(U) at position 19, then the star strand is        designed to generate a standard Watson-Crick pair.    -   7. If the mature strand is 21 nucleotides or longer in length,        then the same criteria that are applied to positions 5 of the        mature strand and the opposite position of the star strand are        similarly applied to positions 21 of the mature strand and the        opposite position of the star strand. Specifically, if position        21 of the mature strand is G or T (U), then the star strand        position opposite it is altered to be T(U) or G (respectively)        to create a wobble pair. If the mature strand has something        other than G or T(U) at position 21, then the star strand is        designed to generate a standard Watson-Crick pair.

The star strand positions opposite the referenced mature strandpositions are provided in Tables 1-2 above. One or more of theseadditional criteria can be combined with Formulas I or II to enhance theperformance of the targeting sequence inserted into the e.g. miR-196a-2,scaffold.

It is important to note that in many cases, the order at which some ofthe steps described above are performed is not critical. Thus, forinstance, sequences can be scored by the algorithm(s) and subsequently,high scoring sequences can be screened to eliminate seeds withundesirable properties. Alternatively, a list of potential sequences canbe generated and screened to eliminate undesirable seeds, and theremaining sequences can then be evaluated by the algorithm(s) toidentify functional targets.

Each formula produces a different range of possible raw scores. In orderto make scores from different formulas more comparable and easier toevaluate, mathematical methods can be employed to normalize raw scoresderived from each formula. Different normalization equations can existfor each formula. Preferably, the normalization equation is chosen toproduce scores in the 0-100 range for all (or almost all) designsequences. When planning to conduct gene silencing, one should choosesequences by comparing the raw scores generated by a formula, orcomparing the normalized scores between formulas. In general a higherscored sequence should be used.

Non-naturally occurring miRNAs of the disclosure (including those wherethe mature strand is rationally designed according to Formula I orFormula II) can be expressed in a variety of vector construct systemsincluding plasmids and viral vectors that maintain sequences eitherepigenetically or insert into the host genome. By way of example, a 21nucleotide mature strand sequence where (A) bases 2-19 are determined byFormula I (position M₁ of the mature strand is preferably a T/U) or arederived from the mature strand sequence of another miR other thanmiR-196a-2; and (B) bases 20-21 are each G (which is the endogenoussequence for the 21 nucleotide mature strand of miR-196a-2) may beintroduced into a miR-196a-2 miRNA scaffold, along with a 21 nucleotidestar strand (TC-S₁-S₁₉), to yield a vector insert having the followingsequence:

SEQ ID NO: 24 5′ TGATCTGTGGCT +[M₁-M₁₉-GG] + GATTGAGTTTTGAAC+[TC-S₁-S₁₉] + AGTTACATCAGTCGGTTTTCG 3′. The reverse complement of this sequence can be annealed, cloned into theappropriate vectors, and expressed to lower the functional capacity ofthe target RNA e.g. to provide long term silencing of a gene ofinterest. Note that as there is evidence to support the existence ofboth a 21 nucleotide and 22 nucleotide endogenous miR-196a-2 maturestrand, then it is possible that the aforementioned sequence, whenexpressed in cells, will be processed to yield a 21 nucleotide maturestrand (M₁-M₁₉-GG) and/or a 22 nucleotide mature strand (M₁-M₁₉-GGG).Thus the G that is underlined in the aforementioned sequence may beeither be part of the miR-196a-2 scaffold or part of the mature strand.

Preferred viral vectors include but are not limited to lentiviralvectors (e.g. HIV, FIV), retroviral vectors, adenoviral vectors,adeno-associated virus, and rabies vectors. In all of these cases,non-naturally occurring miRNAs can be transcribed as non-coding RNAs(e.g. from a pol III promoter) or can be associated with a messenger RNAtranscribed by a pol II promoter. In one embodiment, the promoter is atissue-specific promoter. In another embodiment, the promoter is aregulatable promoter (e.g. a tet promoter or a Reoswitch™). The promotersequence can be derived from the host being targeted or can be takenfrom the genome(s) of another organism. Thus for instance, the promotercan be a viral promoter such as a CMV, HIV, FIV, or RSV promotersequence (such as a promoter found in a Long Terminal Repeat (LTR)). Thesequences encoding the non-naturally occurring miRNA can be positionedin a variety of positions with regard to other elements associated withthe vector system. For instance, the sequences encoding thenon-naturally occurring miRNA can be associated with a gene that isexpressed from a pol II promoter and inserted in the 5′ and/or 3′ UTR ofa gene, or in one or more introns of a gene. In one preferredembodiment, the sequence encoding a non-naturally occurring miRNA isassociated with a marker and/or reporter gene, including but not limitedto a fluorescent reporter (e.g. GFP, YFP, RFP, and BFP), an enzymaticreporter (e.g. luciferase) or a drug resistant marker (e.g. puromycin)or other genes whose expression does not significantly alter thephysiological properties of the cell. In another instance, expression ofthe non-naturally occurring miRNA can be unrelated to the expression ofa gene (i.e. transcribed as a non-coding sequence from a pol IIIpromoter). In some instances, the regulation can incorporate multipleelements described above, for instance combining a regulatable promoter(e.g. P_(tet)) with a tissue specific promoter to provide atissue-specific regulatable expression system.

The number of non-naturally occurring miRNAs associated with aparticular vector construct can also vary. In one embodiment, a singlenon-naturally occurring miRNA is expressed from a vector. In anotherembodiment, two or more non-naturally occurring miRNAs (i.e. a pool) areexpressed from a vector. Where two or more non-naturally occurringmiRNAs are expressed, they need not be related and can be associatedwith a single transcript (e.g. two non-naturally occurring miRNAspresent in the same 3′ UTR) or two separate transcripts (i.e., twonon-naturally occurring miRNAs can be associated with and expressed fromtwo unrelated transcripts). In cases where multiple non-naturallyoccurring miRNAs are expressed from a single vector, the non-naturallyoccurring miRNAs can be identical (e.g. two copies of a non-naturallyoccurring miR 196a-2 miRNA) or dissimilar (e.g. one copy of anon-naturally occurring miR 196a-2 miRNA and one copy of a non-naturallyoccurring miR-204 miRNA etc). Furthermore, the non-naturally occurringmiRNAs can target a single target RNA (thus effectively having a pool ofsequences targeting one gene product) or can target multiple genes (i.e.multigene targeting). Both pooling and multigene targeting can beachieved with the non-naturally occurring miRNAs of the disclosure byanother means. Specifically, multiple non-naturally occurring miRNAstargeting one or more target RNAs can be inserted into multiple vectorsand then combined (mixed) and 1) transfected, or 2) transduced into thecell type of interest.

Vector constructs that encode non-naturally occurring miRNAs may beintroduced into a cell by any method that is now known or that comes tobe known and that from reading this disclosure, persons skilled in theart would determine would be useful in connection with the presentdisclosure. These methods include, but are not limited to, any manner oftransfection, such as for example transfection employing DEAE-Dextran,calcium phosphate, cationic lipids/liposomes, micelles, manipulation ofpressure, microinjection, electroporation, immunoporation, use of viralvectors, cosmids, bacteriophages, cell fusions, and coupling to specificconjugates or ligands.

In cases where the a non-naturally occurring miRNA is delivered to acell using a virus, the vector construct can be maintained in thecytoplasm or can be integrated into and expressed from the host genome(e.g. lentiviral). Such vectors frequently include sequences necessaryfor packaging such viruses, but lack functions that are provided by“helper” plasmids to avoid the generation of infectious particles.Furthermore, when viral systems are being used, the level of expressionof the construct can be manipulated by altering the promoter driving theexpression of the construct (thus altering the level of expression ofthe construct). Alternatively, the expression levels can be altered byadjusting the multiplicity of infection (MOI), effectively altering thenumber of copies of the expression cassette that are placed in eachcell.

According to another embodiment, the present disclosure provides a kitcomprised of at least one non-naturally occurring miRNA.

According to another embodiment, the present disclosure provides a kitcomprised of at least one vector construct that encodes a non-naturallyoccurring miRNA.

According to another embodiment, the present disclosure provides a kitcomprised of at least one miRNA scaffold. The miRNA scaffold can then beused to generate a plurality of different non-naturally occurring miRNAsby cloning mature strand and star strand sequences into the miRNAscaffold, and then expressing the resulting non-naturally occurringmiRNAs in a cell.

The miRNA scaffolds, non-naturally occurring miRNAs, and methods of thedisclosure may be used in a diverse set of applications, including butnot limited to basic research, drug discovery and development,diagnostics, and therapeutics. In each case, the non-naturally occurringmiRNA produced by introducing a mature strand sequence into a miRNAscaffold is used to lower the functional capacity of a target RNA suchas an mRNA produced by a gene of interest. In research settings, thecompositions and methods of the disclosure may be used to validatewhether a gene product is a target for drug discovery or development.

Because the ability of the mature strand sequences embedded in thenon-naturally occurring miRNA of the disclosure to function in the RNAipathway is dependent on the sequence of the target RNA (e.g., an mRNAproduced by a particular gene) and not the species into which it isintroduced, the methods and compositions of the disclosure may be usedto target genes across a broad range of species, including but notlimited to all mammalian species, such as humans, dogs, horses, cats,cows, mice, hamsters, chimpanzees and gorillas, as well as other speciesand organisms such as bacteria, viruses, insects, plants and worms.

The methods and compositions of the disclosure are also applicable foruse for silencing a broad range of genes, including but not limited tothe roughly 45,000 genes of a human genome, and has particular relevancein cases where those genes are associated with diseases such asdiabetes, Alzheimer's, cancer, as well as all genes in the genomes ofthe aforementioned organisms.

In yet another application, non-naturally occurring miRNAs directedagainst a particular family of genes (e.g., kinases), genes associatedwith a particular pathway(s) (e.g., cell cycle regulation), or entiregenomes (e.g., the human, rat, mouse, C. elegans, or Drosophila genome)are provided. Knockdown of each gene of the collection withnon-naturally occurring miRNAs that comprise mature strand sequences atleast partially complementary to an RNA product of the genes wouldenable researchers to quickly assess the contribution of each member ofa family of genes, or each member of a pathway, or each gene in agenome, to a particular biological function.

The methods and compositions of the disclosure may be employed in RNAinterference applications that require induction of transient orpermanent states of disease or disorder in an organism by, for example,attenuating the activity of a target RNA of interest believed to be acause or factor in the disease or disorder of interest. Increasedactivity of the target RNA of interest may render the disease ordisorder worse, or tend to ameliorate or to cure the disease or disorderof interest, as the case may be. Likewise, decreased activity of thetarget nucleic acid of interest may cause the disease or disorder,render it worse, or tend to ameliorate or cure it, as the case may be.Target RNA of interest can comprise genomic or chromosomal nucleic acidsor extrachromosomal nucleic acids, such as viral nucleic acids.

Still further, the methods and compositions of the disclosure may beused in RNA interference applications, such as prophylactics, andtherapeutics. For these applications, an organism suspected of having adisease or disorder that is amenable to modulation by manipulation of aparticular target RNA of interest is treated by administering targetingsequences embedded in preferred scaffold expression systems. Results ofthe treatment may be ameliorative, palliative, prophylactic, and/ordiagnostic of a particular disease or disorder. Preferably, thetargeting sequence is administered in a pharmaceutically acceptablemanner with a pharmaceutically acceptable carrier, diluent, or deliverysystem (e.g. a virus).

Further, the mature non-naturally occurring miRNAs of the disclosure canbe administered by a range of delivery routes including intravenous,intramuscular, dermal, subdermal, cutaneous, subcutaneous, intranasal,oral, rectal, by eye drops, by tissue implantation of a device thatreleases the agent at an advantageous location, such as near an organ ortissue or cell type harboring a target nucleic acid of interest.

Further, the disclosure discloses the use of a non-naturally miRNA inthe manufacture of a medicament for the treatment of a diseasecharacterized by the inappropriate expression of a gene wherein the geneis targeted by the non-naturally occurring miRNA.

The illustrative preferred embodiments of the present invention areexplained in the drawings and described in detail, with varyingmodifications and alternative embodiments being taught. While theinvention has been so shown, described and illustrated, it should beunderstood by those skilled in the art that equivalent changes in formand detail may be made therein without departing from the true spiritand scope of the invention, and that the scope of the invention is to belimited only to the claims except as precluded by the prior art.Moreover, the invention as disclosed herein, may be suitably practicedin the absence of the specific elements which are disclosed herein.

All references cited in the present application are incorporated intheir entirety herein by reference to the extent not inconsistentherewith.

The following examples are for illustrative purposes only and are notintended to limit the scope of the invention.

EXAMPLES

The following system of nomenclature was used to compare and reportsiRNA-silencing functionality: “F” followed by the degree of minimalknockdown. For example, F50 signifies at least 50% knockdown, F80 meansat least 80% knockdown, and so forth. For this study, all sub-F50 RNAswere considered nonfunctional.

General Techniques

Total genomic DNA extraction: Total HeLa genomic DNA was extracted usinga DNeasy Genomic DNA isolation kit (Qiagen). Overall integrity of theDNA was verified on a 0.8% agarose gel stained with Ethidium Bromide.

PCR Amplification of miRNAs from Genomic DNA:

PCR was used to amplify various miRNAs for testing in the dualluciferase system. Natural miRNAs were amplified from 10-100 ng HeLagenomic DNA with Qiagen Taq PCR Master Mix (Cat No 201443) and 10 μM ofeach primer. The PCR parameters were: 4 min at 94° C. for initialdenaturation, 15 seconds at 94° C., 30 seconds at 50-60° C., and 45seconds at 72° C. for 30 cycles, 2 min at 72° C. for final extension.Sequences used for amplification are provided in Table 3 below.

Table 3. SpeI and BglII represent restriction sites that wereincorporated into the primer sequences. “For”=forward primer.“Rev”=reverse primer. All sequences provided in 5′→3′ orientation.

TABLE 3 SEQ ID Primer Sequence of Primer, 5′→3′ NO: SpeImiR338-TCATACTAGTGAGACAGACCCTGCTTCGAA 26 For GGACC Bg1IImiR338-TCATAGATCTTGTCCCTCCCCACATAAAAC 27 Rev CCATG SpeImiR30c-1-TCATACTAGTTTTTACTCAGCCAGCCCAAG 28 For TGGTTCTGTG Bg1IImiR30c-1-TCATAGATCTACATCTGGTTCTGGTTGTAC 29 Rev TTAGCCAC SpeImiR-26b-TCATACTAGTTGGATACATGTGGAATGTCA 30 For GAGGC Bg1IImiR-26b-TCATAGATCTTGACCACTGCTGGGGAAACT 31 Rev GTACC SpeImiR196a-2-TCATACTAGTTCAGACCCCTTACCCACCCA 32 For GCAACC Bg1IImiR196a-2-TCATAGATCTAGAGGACGGCATAAAGCAGG 33 Rev GTTCTCCAG SpeImiR196a-1-TCATACTAGTTCCGATGTGTTGTTTAGTAG 34 For CAACTGGG Bg1IImiR196a-1-TCATAGATCTGACACTTCCCAGATCTCTTC 35 Rev TCTGG SpeImiR30a-TCATACTAGTCGGTGATGAATAATAGACAT 36 For CCATGAGCC Bg1IImiR30a-TCATAGATCTACCTCCTCAATGCCCTGCTG 37 Rev AAGC SpeImiR126-TCATACTAGTGGCACTGGAATCTGGGCGGA 38 For AG Bg1IImiR126-TCATAGATCTAGAAGACTCAGGCCCAGGCC 39 Rev TCTG SpeImiR-204-TCATACTAGTTGAGGGTGGAGGCAAGCAGA 40 For GGACC Bg1IImiR-204-TCATAGATCTTTGGACCCAGAACTATTAGT 41 Rev CTTTGAG SpeImiR486-TCATACTAGTGCGGGCCCTGATTTTTGCCG 42 For AATGC Bg1IImiR486-TCATAGATCTAGCATGGGGCAGTGTGGCCA 43 Rev CAG SpeImiR135a-2-TCATACTAGTAAATCTTGTTAATTCGTGAT 44 For GTCACAATTC Bg1IImiR135a-2-TCATAGATCTCACCTAGATTTCTCAGCTGT 45 Rev CAAATC SpeImiR374-TCATACTAGTCAATTCCGTCTATGGCCACG 46 For GGTTAGG Bg1IImiR374-TCATAGATCTTGTGGAGCTCACTTTAGCAG 47 Rev GCACAC SpeImiR526a-1-TCATACTAGTAATGTAAGGTATGTGTAGTA 48 For GGCAATGC Bg1IImiR526a-1-TCATAGATCTAGTTCCTGATACTGAGCTCC 49 Rev AGCCAG

Two of the primer sets (for miR 338 and miR 135a-2) failed to amplifythe respective sites. For the remaining scaffolds, the PCR product wasgel-purified, treated with SpeI/BglII (NEB) and cloned into the MCS of ahighly modified pCMV-Tag4 with GFP containing an artificial intron.Successful cloning was confirmed by sequencing. As a result of theseprocedures, the miRNAs are localized as within an artificial introndownstream of the ATG start site of GFP. (See FIG. 4).

psiCheck Dual-Luc Reporter Constructs:

The dual-luciferase plasmid, PsiCHECK™-2 Vector, containing both thehumanized firefly luciferase gene (hluc) and the humanized Renillaluciferase gene (hRluc), each with its own promoter and poly(A)-additionsites, was obtained from Promega (Cat.# C8021). Reverse complementtarget sequences were inserted between the XhoI Not I restriction sitesin the multiple cloning site in the 3′ UTR of the hRluc gene. Insertsequences were ordered from Operon to make an insert compatible with therestriction sites. Firefly and Renilla luciferase activities weremeasured using the Dual-Glo™ Luciferase Assay System (Promega, Cat.#E2980) according to manufacturer's instructions with slightmodification. When lysing cells, growth media was aspirated from thecells prior to adding 50 uL of firefly luciferase substrate and 50 uLRenilla luciferase substrate.

Cell viability was determined on a duplicate plate using the AlamarBlue®assay (BioSource Intl, Inc). Cell viabilities for control andexperimentally treated cells were always within 15%.

For experiments requiring the quantitative determination of mRNA, cellswere lysed in 1× lysis mixture and mRNA quantitation was performed bythe branched DNA (bDNA) assay (QuantiGene® Screen Kit, Cat.# QG-000-050,Panomics). Branched DNA probes for targeted genes were designed byPanomics and in-house.

The Luciferase, alamarBlue and bDNA assays were all scanned with aWallac Victor 1420 multilabel counter (Perkin Elmer) using programs asrecommended by the manufacturers.

Cell Culture and Transfection:

One day prior to transfection, HeLacells were plated in a 96-well plateat cell density of at about 10,000 cells per well in Dulbecco's modifiedEagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS)without antibiotics. On the day of transfection, the appropriatemixtures were prepared (e.g. psiCheck dual luciferase plasmid containingthe appropriate target sequences; plasmids (control and experimental)expressing the scaffold construct; siRNAs (100 nM) targeting the targetsequence; Lipid delivery reagents (e.g. Lipofectamine 2000)). Themixtures were then introduced into cells using art-recognizedtransfection conditions.

Experimental Design and Data Analysis

All treatments were run in triplicate. To account for non-specificeffects on reporter plasmids, experimental results are expressed as anormalized ratio (Rluc/Fluc)_(norm): the ratio of Renilla luciferaseexpression to firefly luciferase expression for a given miRNA reporterplasmid (Rluc/Fluc)_(miRNA) divided by the (Rluc/Fluc)_(control) for anon-targeting sequence co-transfected with the reporter plasmid. Themaximum values obtained from the reporter plasmid vary due to sequence;ideally values around 1 indicate low miRNA function, while values closeto zero indicate high miRNA function. Data are reported as the averageof the three wells and the error bars are the standard deviation of thethree (Rluc/Fluc)_(miRNA) ratios from the experimental treatment, scaledby the normalizing factor (the average of (Rluc/Fluc)_(control)). Werecognize that ratios do not follow a normal distribution, but believethat the standard deviation values give a good sense of the variabilityof the data.

Example 1 Identification of High Performance miRNA Scaffolds

To identify highly functional miRNA scaffolds, ten separate miRNAs(including miR-126, 204, -196a2, -30c-1, -26b, -30a, -374, -196a1, -526,and -486) were PCR amplified from genomic DNA, and cloned into theSpeI/BglII sites of the artificial intron of GFP (see FIG. 4A-E). Inparallel, dual-luciferase reporter plasmids containing the appropriate(reverse complement) target site in the 3′ UTR of hRluc were constructed(FIG. 4F and Table 4). Plasmids encoding both constructs (both theartificial miRNA expression vector and the dual-luciferase reportervector) were co-transfected into HeLa cells (10K cells per well in a96-well plate) using Lipofectamine 2000 (Invitrogen, 0.2 μl per well) bystandard forward transfection techniques, and assessed 48 hours later todetermine the level of knockdown of the luciferase reporter.

Results of these studies are presented in FIG. 5 and demonstrate thatsome miRNA scaffolds are more functional than others. Half of theconstructs were eliminated from further study due to their inability tosignificantly silence the reporter construct (see miR-30a, -374, -196a1,-526, and -486, all showed less than 80% silencing of the reporterconstruct). In addition, further studies of miR-126 found that bothstrands of the hairpin (the mature and the star strand, referred to asthe 5′ and 3′ strands respectively) were functional. Although havingboth star strand and the mature strand activity may be desirable in someapplications (for example, to silence two different target genes),further studies of this construct were canceled. The three remaining miRscaffolds (-204, -26b, and -196a2) were identified as being highlyfunctional, providing >80% silencing of the dual luciferase reporterconstruct. Interestingly, miR-196a-1, which has the same mature sequenceas miR-196a-2 was identified as one of the less optimal scaffolds,suggesting that the sequence that surrounds the mature miR sequence mayplay an important role in Drosha/Dicer processing and that these effectsmay have a significant impact on miR functionality.

Example 2 Modifying miR Scaffold Sequences to Enable Cloning of ForeignSequences

A key attribute of a miRNA scaffold for delivery of targeting sequencesis the ability to introduce (clone) sequences into the scaffold andretain functionality. To achieve this, the three top performingscaffolds identified in Example 1 (miR-26b, -196a2, and -204) weremodified to incorporate restriction sites into the constructs usingstandard molecular biology techniques. Subsequently, each construct wastested using the appropriate dual luciferase reporter constructcontaining the reverse complement to the mature targeting strand, todetermine whether the changes altered either mature or star strandactivity. For miR-26b and miR-204, nucleotide changes were made tointroduce a BlpI and SacI restriction site into the construct (see FIGS.6 A,B, top). For miR-196a-2, a natural BlpI site was already present inthe construct, and therefore second restriction sites (including SacI,Seal, and XbaI) were tested in combination (FIG. 6C, top and bottom).

For miR-26b, incorporation of the SacI site had little or no effect onoverall functionality of the mature strand (FIG. 6A, bottom, 85-90%functionality). Further modification with the additional Blp1 site(BlpI/SacI) had a small effect on overall functionality, reducingsilencing by the mature strand to about 80%. The combined BlpI/SacImodification further limited star strand activity. As a result of thesestudies, a complementary pair of restriction sites (BlpI/SacI) thatcould be used for cloning foreign sequences into the miR-26b scaffoldhad been identified.

For miR-204, neither the incorporation of the SacI site or the combinedSacI/BlpI sites affected mature strand activity (FIG. 6B, bottom). Aswas observed with the miR-26b construct, modification of the scaffold toincorporate both restriction sites suppressed functionality of the starstrand (˜60% silencing→˜40% silencing). Thus, as a result of thesestudies, two goals were achieved. First, a complementary pair ofrestriction sites that could be used for cloning foreign sequences intothe miR-204 scaffold had been identified. Secondly, modifications hadbeen identified that further limited the functionality of the miR-204star strand.

Identifying a combination of restriction sites that were compatible withthe miR-196a-2 scaffold (FIG. 6C) was found to be more problematic. Asshown in FIG. 6D, addition of a ScaI site (or combination of a ScaI sitewith a SacI site) significantly decreased the mature strand activity,and both ScaI+ and ScaI+/SacI+ constructs exhibited enhanced activity ofthe star strand. As enhanced star strand activity is deemed undesirable,this restriction site combination was abandoned.

Fortuitously, incorporation of the SacI site alone had little affect onmature strand activity and further crippled the functionality of thestar strand (FIG. 6D, 40% silencing→420% silencing). Thus, as was thecase with miR-204, two separate goals were achieved. First, acomplementary pair of restriction sites that could be used for cloningforeign sequences into the miR-196a2 scaffold had been identified (Blp1,SacI). Secondly, modifications had been identified that further limitedthe functionality of the miR-196a-2 star strand.

Example 3 Identifying an miRNA Scaffold that Readily Accepted ForeignSequences

To determine which of the three preferred miRNA scaffolds most readilyaccepted foreign sequences, a “walk” of sequences targeting GAPDH wereembedded into each of the scaffolds under study and cloned into anartificial intron in GFP. The walk consists of sequences that are 21 bpin length, with the 5′ terminus of each consecutive sequence shifted by2 bp (FIG. 7A). In the case of all three vectors, inserts were (to thebest of our abilities) designed to preserve natural secondary structures(e.g. bulges, mismatches) that were present in each of the endogenousscaffolds. In addition, a fourth walk consisting of each sequenceembedded in the miR-196a-2 scaffold without secondary structure (i.e.simple hairpins) was performed to better understand the importance ofsecondary structure in functionality. The results of each of these wascompared with results obtained when equivalent synthetic siRNA weretransfected into the cells. In addition, the GAPDH target sequence thatwas embedded into the psiCheck (dual luciferase) reporter is provided inFIG. 7B.

FIG. 7C shows the functionality of each sequence in the walk when it isintroduced into the cell as a synthetic 19 bp siRNA. As observedpreviously, small changes in the position of the targeting siRNA cangreatly alter functionality. When those same sequences are introducedinto miR-26b, miR-204, and miR-196a-2 and delivered as an expressionconstruct, the latter two scaffolds exhibit greater levels offunctionality than the miR-26b scaffold (FIGS. 7 D, E, and F).Interestingly, when all secondary structure was eliminated fromsequences incorporated into the miR-196a-2 structure, functionality wasfound to be greatly suppressed (FIG. 7G). A side-by-side comparisonshowed that some scaffolds (e g, miR-196a-2 and -204) providedfunctionality with a greater number of sequences than other scaffolds(e.g. miR-26b, see FIG. 7H). Together, these findings demonstrate thatall three scaffolds (most preferably the miR-204 and miR-196a-2scaffolds) are useful for delivering foreign sequences and demonstratethat preserving secondary structure is a preferred for optimalfunctionality.

Example 4 Analysis of Preferred Targeting Sequences

Highly functional sequences (>70%) from the miR-196a-2 GAPDH walk wereassessed to identify position-specific preferences. When this wasperformed, it was immediately clear that a “U” at position 1 in themature strand (a “T” in the DNA encoding that position) wascharacteristic of highly functional sequences targeting foreign genes.For this reason, this criterion was the first to be identified asdesirable for optimal functionality. At position 5 and 6 there was apreference for Ts and As, respectfully. At position 7, few of thefunctional sequences had a G at this position (FIG. 8A). Position 12,which is the site of a mismatch in the endogenous miR-196a-2, there wasan under-representation of “A” and an over-representation of “T”. Asimilar over-representation of “T” was observed in functional sequencesat position 13. In this way, site-specific preferences for particularnucleotides were identified.

Further studies were then conducted to identify the importance of eachsecondary structure. Substituting a mismatch for the GU wobble found atposition 1 was observed to be detrimental to overall functionality.Similarly, expanding the size of the mismatch found at position 12 topositions 12 and 13 was also found to be detrimental.

Further analysis of functional and nonfunctional sequences identifiedstrong correlations between functionality and GC content. A comparisonof the overall GC content and functionality from the sequences tested inthe miR-196a-2 GAPDH walk study showed that in general, the most highlyfunctional sequences had lower GC content. As shown in FIG. 8B, of the25 sequences having 10 G or C nucleotides or less, 18 (72%) exhibit 50%silencing or greater. In contrast, of the 22 sequences having 11 or moreG or C nucleotides, 17 (77%) showed less than 50% silencing, suggestingthat overall GC content should be considered in designing foreignsequences to be inserted into the miR-196a-2 scaffold.

Example 5 Comparison of siRNA and shRNA Algorithms

The results obtained from the previous Examples were used to develop analgorithm for identifying target sites that could be targetedefficiently with foreign sequences inserted into the miR-196a-2 scaffold(see Formulas I and II and related descriptions). A side-by-sidecomparison between target sites identified in the CDC2 gene by themiR-196a-2 algorithm and an algorithm used to design siRNA (see U.S.patent application Ser. No. 10/940,892, filed Sep. 14, 2004, publishedas U.S. Pat. App. Pub. No. 2005/0255487) show that the two algorithmsidentify different sequences with very little overlap (see FIG. 9A).

Subsequently, the miR-196a-2 algorithm and the siRNA algorithm wereapplied to two genes, MAPK1 and EGFR (see FIG. 9B-9C). Targetingsequences were then cloned into the miR-196a-2 scaffold using thepreviously described restriction sites and co-transfected into HeLacells along with the appropriate dual luciferase reporter construct(FIG. 9B, C for target sequences). The results are shown in FIG. 9D andshow that for EGFR, only two of the five clones selected providedgreater than 80% gene knockdown. In contrast, four out of the fiveclones selected by the new miR-196a-2 algorithm gave >80% knockdown. ForMAPK1, three of the four sequences selected by the siRNA algorithmprovided >80% knockdown. In contrast, all five clones selected by thenew miR-196a-2 algorithm gave >80% knockdown.

In a further test of the effectiveness of the miR-196a-2 designalgorithm, targeting sequences against CDC2 (NM_(—)001786), CD28(NM_(—)006139), CD69 (NM_(—)001781), and LAT (NM_(—)014387), weredesigned and cloned into the miR-196a-2 scaffold and subsequently testedfor the ability to knockdown the target gene using the dual luciferaseassay. The results of these studies are found in FIG. 9E and show that22 out of the 25 sequences (88%) that were designed using the miR196a-2algorithm provided greater than 75% silencing.

Detailed studies of constructs that failed to provide sufficientknockdown of a target (i.e. <50%, Zap70, FIG. 9F provides targetsequence) revealed that a large number of these sequences selected bythe algorithms disclosed herein contained strings of Gs and Cs,particularly in the seed region of the mature strand (see FIG. 9G).Subsequent analysis of functional targeting sequences showed that therewas a preference for low instability in the mature strand seed region(FIG. 9H). For this reason, additional penalties were incorporated intothe miR-196a-2 algorithm to limit GC content in this region.

Table 4 below provides the sequences used in Examples 1-3 (all in 5′→3′direction):

TABLE 4 Dual Luc reporter sequences for Screening miRs for functionalitysense miR-338 TCGAATGACCCTTCAACAAAATCAC TGATGCTGGAGTCTCGAGCTGC(SEQ ID NO: 50) sense miR-30c-1 TCGAATGACCCAGCTGAGAGTGTAGGATGTTTACACACTCGAGCTGC (SEQ ID NO: 51) sense miR-26bTCGAATGACCACAACCTATCCTGAA TTACTTGAACTCTCGAGCTGC (SEQ ID NO: 52) sensemiR-196a-2 TCGAATGACCTCCCAACAACATGAA ACTACCTAAGCTCGAGCTGC(SEQ ID NO: 53) sense miR-196a-1 TCGAATGACCGCCCAACAACATGAAACTACCTAATCTCGAGCTGC (SEQ ID NO: 54) sense miR-30a-5pTCGAATGACCAGCTTCCAGTCGAGG ATGTTTACAGTCTCGAGCTGC (SEQ ID NO: 55) sensemiR-126* TCGAATGACCAGCGCGTACCAAAAG TAATAATGTCCTCGAGCTGC (SEQ ID NO: 56)sense miR-126 TCGAATGACCGCGCATTATTACTCA CGGTACGAGTCTCGAGCTGC(SEQ ID NO: 57) sense miR-204 TCGAATGACCTCAGGCATAGGATGACAAAGGGAAGTCTCGAGCTGC (SEQ ID NO: 58) sense miR-135a-2TCGAATGACCTATCACATAGGAATA AAAAGCCATAAACTCGAGCTGC (SEQ ID NO: 59) sensemiR-374 TCGAATGACCAACACTTATCAGGTT GTATTATAATGCTCGAGCTGC (SEQ ID NO: 60)sense miR-526a-1 TCGAATGACCACAGAAAGTGCTTCC CTCTAGAGGGCTCGAGCTGC(SEQ ID NO: 61) sense miR-486 TCGAATGACCAGCTCGGGGCAGCTCAGTACAGGATACTCGAGCTGC (SEQ ID NO: 62)Dual Luc reporter sequences for detection of star strand activity sensemiR-204 ggaggctgggaaggcaaagggacgt (SEQ ID NO: 63) sense miR-26bccagcctgttctccattacttggct (SEQ ID NO: 64) sense miR-196a-2actcggcaacaagaaactgcctgag (SEQ ID NO: 65) sense miR-30a-3pTCGAATGACCCAGCTGCAAACATCC GACTGAAAGCCCTCGAGCTGC (SEQ ID NO: 66)Artificial Intron CAGGTAAGTTAGTAGATAGATAGCGTGCTATTTACTAGTCGTAGATCTACAATGTTGAATTCTCACGCGGCCGCTCTACTAACCCTTCTTTTCTTTCTCTTCCTTTCATCTTTCAGGCG (SEQ ID NO: 67)Probe sequence used for northern blot analysis PROBE miR-196a-CCAACAACATGAAACTACCTA 2as (SEQ ID NO: 68)EGFR and MAPK sequences-siRNA design-sense strand S-196a-2-TCAGCTGATCTGTGGCTTTTCGTAGTACATATTTCCTCG EGFR-1ATTGAGTTTTGAACGAGGAAATAAGTACTATGAAGAGTTACATCAGTCGGTTTTCGTCGAGGGCCCCAACCGAGCT (SEQ ID NO: 69) S-196a-2-TCAGCTGATCTGTGGCTAACTGCGTGAGCTTGTTACTCG EGFR-2ATTGAGTTTTGAACGAGTAACAACCTCACGTAGTTAGTTACATCAGTCGGTTTTCGTCGAGGGCCCCAACCGAGCT (SEQ ID NO: 70) S-196a-2-TCAGCTGATCTGTGGCTCATTGGGACAGCTTGGATCACG EGFR-3ATTGAGTTTTGAACGTGGTCCAACCTGTCCTAATGAGTTACATCAGTCGGTTTTCGTCGAGGGCCCCAACCGAGCT (SEQ ID NO: 71) S-196a-2-TCAGCTGATCTGTGGCTTCTGTCACCACATAATTACGGG EGFR-4ATTGAGTTTTGAACTCGTAATTAAGTGGTGGCAGGAGTTACATCAGTCGGTTTTCGTCGAGGGCCCCAACCGAGCT (SEQ ID NO: 72) S-196a-2-TCAGCTGATCTGTGGCTTATTCCGTTACACACTTTGCGG EGFR-5ATTGAGTTTTGAACTGTGAAGTGAGTAACGGAATGAGTTACATCAGTCGGTTTTCGTCGAGGGCCCCAACCGAGCT (SEQ ID NO: 73) S_196a-2-TCAGCTGATCTGTGGCTAATTTCTGGAGCCCTGTACCAG MAPK1-1ATTGAGTTTTGAACTGGTACAGGCGTCCAGGAATTAGTTACATCAGTCGGTTTTCGTCGAGGGCCCCAACCGAGCT (SEQ ID NO: 74) S_196a-2-TCAGCTGATCTGTGGCTCTTGTAAAGATCTGTTTCCATG MAPK1-2ATTGAGTTTTGAACGTGGAAACACATCTTTGCAAGAGTTACATCAGTCGGTTTTCGTCGAGGGCCCCAACCGAGCT (SEQ ID NO: 75) S_196a-2-TCAGCTGATCTGTGGCTAATAAGTCCAGAGCTTTGGAGG MAPK1-3ATTGAGTTTTGAACTTTTAAAGCACTGGACTTATTAGTTACATCAGTCGGTTTTCGTCGAGGGCCCCAACCGAGCT (SEQ ID NO: 76) S_196a-2-TCAGCTGATCTGTGGCTAAAGCAAATAGTTCCTAGCTTG MAPK1-4ATTGAGTTTTGAACGAGTTAGGATCTATTTGCTTTAGTTACATCAGTCGGTTTTCGTCGAGGGCCCCAACCGAGCT (SEQ ID NO: 77) S_196a-2-TCAGCTGATCTGTGGCTTACAATTCAGGTCTTCTTGTGG MAPK1-5ATTGAGTTTTGAACTATGAGAAGTCCTGAATTGTGAGTTACATCAGTCGGTTTTCGTCGAGGGCCCCAACCGAGCT (SEQ ID NO: 78)EGFR and MAPK sequences-shRNA design-sense strand EGFR ATGATCTGTGGCTTATTCGTAGCATTTATGGAGGGATTGAGTTTTGAACTCTTCATAATTGCTACGAATGAGTTACATC AGTCGGTTTTCG (SEQ ID NO: 79)EGFR B TGATCTGTGGCTTCGTAGTACATATTTCCTCGGGATTGAGTTTTGAACTCGGGGAAAAATGTACTACGGAGTTACATC AGTCGGTTTTCG (SEQ ID NO: 80)EGFR C TGATCTGTGGCTTCGTCTCGGAATTTGCGGCGGGATTGAGTTTTGAACTCGTCGCAATTTCCGAGACGGAGTTACATC AGTCGGTTTTCG (SEQ ID NO: 81)EGFR D TGATCTGTGGCTTACGGTTTTCAGAATATCCGGGATTGAGTTTTGAACTCGGATATTGTGAAAATCGTGAGTTACATC AGTCGGTTTTCG (SEQ ID NO: 82)EGFR E TGATCTGTGGCTTCCGGTTTTATTTGCATCAGGGATTGAGTTTTGAACTCTGATGCATATAAAATCGGGAGTTACATC AGTCGGTTTTCG (SEQ ID NO: 83)MAPK1 A TGATCTGTGGCTTTGCTCGATGGTTGGTGCTGGGATTGAGTTTTGAACTCGGCACCATCCATCGGGCAGAGTTACATC AGTCGGTTTTCG (SEQ ID NO: 84)MAPK1 B TGATCTGTGGCTTCGAACTTGAATGGTGCTTGGGATTGAGTTTTGAACTCGGGCACCTTTCAAGTTCGGAGTTACATC AGTCGGTTTTCG (SEQ ID NO: 85)MAPK1 C TGATCTGTGGCTTACTCGAACTTTGTTGACAGGGATTGAGTTTTGAACTCTGTCAACTAAGTTCGAGTGAGTTACATC AGTCGGTTTTCG (SEQ ID NO: 86)MAPK1 D TGATCTGTGGCTTCGTAATACTGCTCCAGATGGGATTGAGTTTTGAACTCGTCTGGACCAGTATTACGGAGTTACATC AGTCGGTTTTCG (SEQ ID NO: 87)MAPK1 E TGATCTGTGGCTTCCATGAGGTCCTGTACTAGGGATTGAGTTTTGAACTCTGGTACACGACCTCGTGGGAGTTACATC AGTCGGTTTTCG (SEQ ID NO: 88)

1-59. (canceled)
 60. A non-naturally occurring miR-196a-2 miRNAcomprising a nucleic acid having a stem-loop structure wherein the stemof the stem-loop structure incorporates a mature strand-star strandduplex, wherein the sequence of said mature strand is distinct from thesequence of the endogenous mature strand of miR-196a-2 and is at leastpartially complementary to a portion of a target RNA, and wherein saidstar strand is at least partially complementary to said mature strand.61. The non-naturally occurring miR-196a-2 miRNA of claim 60 comprisingat least one of: a nucleotide deletion relative to the sequence of thestem of endogenous miR-196a-2; a nucleotide addition relative to thesequence of the stem of endogenous miR-196a-2; a nucleotide substitutionrelative to the sequence of the stem of endogenous miR-196a-2; anucleotide deletion relative to the sequence of the loop of endogenousmiR-196a-2; a nucleotide addition relative to the sequence of the loopof endogenous miR-196a-2; a nucleotide substitution relative to thesequence of the loop of endogenous miR-196a-2; a 5′ flanking nucleotidesequence of between about 5-150 nucleotides in length; and a 3′ flankingnucleotide sequence of between about 5-150 nucleotides in length. 62.The non-naturally occurring miR-196a-2 miRNA claim 60 comprising thesequence (SEQ ID NO:10):

wherein M is said mature strand and wherein S is said star strand. 63.The non-naturally occurring miR-196a-2 mRNA of claim 60 wherein saidmature strand is between about 19-25 nucleotides in length.
 64. Thenon-naturally occurring miR-196a-2 mRNA of claim 60 wherein said maturestrand is about 19 nucleotides in length.
 65. The non-naturallyoccurring miR-196a-2 mRNA of claim 60 wherein the nucleotide at position1 of said mature strand is U.
 66. The non-naturally occurring miR-196a-2mRNA of claim 65 wherein said star strand comprises a G nucleotide atthe nucleotide position which is opposite position 1 of said maturestrand.
 67. The non-naturally occurring miR-196a-2 miRNA of claim 60wherein the nucleotide at position 12 of said mature strand does notform a base pair with the nucleotide at the opposite position on saidstar stand.
 68. The non-naturally occurring miR-196a-2 miRNA of claim 60wherein the sequence of said mature strand is distinct from anyendogenous miRNA mature strand.
 69. A recombinant expression vectorcomprising a nucleotide sequence that encodes a non-naturally occurringmiR-196a-2 miRNA, said non-naturally occurring miR-196a-2 miRNAcomprising a nucleic acid having a stem-loop structure wherein the stemof the stem-loop structure incorporates a mature strand-star strandduplex, wherein the sequence of said mature strand is distinct from thesequence of the endogenous mature strand of miR-196a-2 and is at leastpartially complementary to a portion of a target RNA, and wherein saidstar strand is at least partially complementary to said mature strand.70. The recombinant expression vector of claim 69 wherein said vectorcomprises a promoter operably linked to a reporter gene comprising anartificial intron, and wherein said non-naturally occurring miRNA islocated within said artificial intron.
 71. The recombinant expressionvector of claim 70 wherein said vector comprises a promoter operablylinked to a reporter gene having a 3′ untranslated region (UTR), andwherein said non-naturally occurring miRNA is located within said 3′UTR.
 72. The recombinant expression vector of claim 69, wherein saidvector is a lentiviral vector.
 73. A pharmaceutical compositioncomprising: a non-naturally occurring miR-196a-2 miRNA, saidnon-naturally occurring miR-196a-2 miRNA comprising a nucleic acidhaving a stem-loop structure wherein the stem of the stem-loop structureincorporates a mature strand-star stand duplex, wherein the sequence ofsaid mature strand is distinct from the sequence of the endogenousmature strand of miR-196a-2 and is at least partially complementary to aportion of a target RNA, and wherein said star strand is at leastpartially complementary to said mature strand; and a least onepharmaceutically acceptable carrier.
 74. A cell comprising anon-naturally occurring miR-196a-2 miRNA, said non-naturally occurringmiR-196a-2 miRNA comprising a nucleic acid having a stem-loop structurewherein the stein or the stem-loop structure incorporates a maturestrand-star strand duplex, wherein the sequence of said mature strand isdistinct from the sequence of the endogenous mature strand of miR-196a-2and is at least partially complementary to a portion of a target RNA,and wherein said star strand is at least partially complementary to saidmature strand
 75. A method of lowering the functional capacity of atarget RNA in a cell, the method comprising: contacting said cell with anon-naturally occurring miR-196a-2 miRNA, said non-naturally occurringmiR-196a-2 miRNA comprising a nucleic acid having a stem-loop structurewherein the stem of the stem-loop structure incorporates a maturestrand-star strand duplex, wherein the sequence of said mature strand isdistinct from the sequence of the endogenous mature strand of miR-196a-2and is at least partially complementary to a portion of a target RNA,and wherein said star strand is at least partially complementary to saidmature strand; wherein said non-naturally occurring miRNA is processedin said cell to yield a mature miRNA having the sequence of said maturestrand.
 76. A method for selecting the sequence of a mature strand and astar strand of a non-naturally occurring miR-196a-2 miRNA capable ofreducing the functional activity of a target RNA when expressed in acell, the method comprising: analyzing the nucleotide sequence of saidtarget RNA to identify a sequence: 5′R₁R₂R₃R₄R₅R₆R₇R₈R₉R₁₀R₁₁R₁₂R₁₃R₁₄R₁₅R₁₆R₁₇R₁₈A₁₉ 3′

wherein R₁-R₁₈ are nucleotides selected according to at least onecriterion selected from the group consisting of: 1) at R₁, G is favoredover A, A is favored over U, and U is favored over C 2) at R₂, A isfavored over each of U and G, and each of U and G is favored over C; 3)at R₃, each of U, G, and A is favored over C 4) at R₄, each of U, G andA is favored over C 5) at R₅, A is favored over each of U and G, andeach of U and G is favored Over C 6) at R₇, A is favored over each U andC, and each of U and C is favored over G 7) at R₈, A is favored overeach of U and G, and each of U and G is favored over C 8) at R₁₂, A isfavored over each of U and C, and each of U and C is favored over G 9)at R₁₃, U is favored over each of G and A, and each of G and A isfavored over C 10) at R₁₄, U is favored over each of C and G, and eachof C and G is favored over A; and 11) at R₁₅, A is favored over each ofC, G, and U; 12) said subsequence does not include tetranucleotidesequence selected from the group consisting of AAAA, UUUU, GGGG, andCCCC; 13) said subsequence has a total G+C content of not more than 10;14) said subsequence has a G+C content of not more than 4 betweenR₁₂-R₁₈; selecting the sequence of said mature strand wherein saidmature strand is at least partially complementary to said subsequence;and selecting the sequence of said star strand wherein said star strandis at least partially complementary to said mature strand.
 77. Themethod of claim 76 wherein said mature strand sequence comprises: 5′U₁M₂M₃M₄M₅M₆M₇M₈M₉M₁₀M₁₁M₁₂M₁₃M₁₄M₁₅M₁₆M₁₇M₁₈M₁₉ 3′

wherein M₂-M₁₉ are nucleotides.
 78. The method of claim 76 wherein saidstar strand sequence comprises 5′S₁S₂S₃S₄S₅S₆S₇S₈S₉S₁₀S₁₁S₁₂S₁₃S₁₄S₁₅S₁₆S₁₇S₁₈G₁₉ 3′

wherein S₁-S₁₈ are nucleotides, and wherein M₁₂ and S₈ are notcomplementary bases.
 79. The method of claim 76 wherein saidnon-naturally occurring miR-196a-2 comprises the sequence (SEQ IDNO:10):

wherein M is said mature strand and S is said star strand sequence. 80.A non-naturally occurring miR-196a-2 miRNA capable of reducing thefunctional activity of a target RNA when expressed in a cell, saidnon-naturally occurring miR-196a-2 miRNA comprising a mature strand anda star strand, wherein the sequence of said mature strand and said starstrand is selected according to the method comprising: analyzing thenucleotide sequence of said target RNA to identify a subsequence: 5′R₁R₂R₃R₄R₅R₆R₇R₈R₉R₁₀R₁₁R₁₂R₁₃R₁₄R₁₅R₁₆R₁₇R₁₈A₁₉ 3′

wherein R₁-R₁₈ are nucleotides selected according to at least onecriterion selected from the group consisting of: 1) at R₁, G is favoredover A, A is favored over U, and U is favored over C 2) at R₂, A isfavored over each of U and G, and each of U and G is favored over C; 3)at R₃, each of U, G, and A is favored over C 4) at R₄, each of U, G, andA is favored over C 5) at R₅, A is favored over each of U and C, andeach of U and G is favored over C 6) at R₇, A is favored over each U andC, and each of U and C is favored over G 7) at R₈, A is favored overeach of U and G, and each of U and G is favored over C 8) at R₁₂, A isfavored over each of U and C, and each of U and C is favored over G 9)at R₁₃, U is favored over each of G and A, and each of G and A isfavored over C 10) at R₁₄, U is favored over each of C and G, and eachof C and G is favored over A; and 11) at R₁₅, A is favored over each ofC, G, and U; 12) said subsequence does not include tetranucleotidesequence selected from the group consisting of AAAA, UUUU, GGGG, andCCCC; 13) said subsequence has a total G+C content of not more than 10;14) said subsequence has a G+C content of not more than 4 betweenR₁₂-R₁₈; selecting the sequence of said mature strand wherein saidmature strand is at least partially complementary to said subsequence;and selecting the sequence of said star strand wherein said star strandis at least partially complementary to said mature strand.
 81. A methodfor selecting the sequence of a mature strand and a star strand of anon-naturally occurring miR-194a-2 miRNA capable of reducing thefunctional activity of a target RNA when expressed in a cell, the methodcomprising: analyzing the nucleotide sequence of said target RNA toidentify a subsequence: 5′R₁R₂R₃R₄R₅R₆R₇R₈R₉R₁₀R₁₁R₁₂R₁₃R₁₄R₁₅R₁₆R₁₇R₁₈R₁₉R₂₀A₂₁ 3′

wherein R₁-R₂₀ are selected according to at least one criterion selectedfrom the group consisting of: 1) at R₁, A is favored over each U, C, andG 2) at R₂, each of U, C, and A are favored over G; 3) at R₃, G isfavored over A, A is favored over U, and U is favored over C; 4) at R₄,A is favored over each of U and G, and each of U and G is favored overC; 5) at R₅, A is favored over each of U and G, and each of U and G isfavored over C; 6) at R₆, each of U and A is favored over C, and C isfavored over G; 7) at R₇, is A is favored over each of U and G, and eachof U and G is favored over C; 8) at R₈, each of U, C, and A is favoredover G; 9) at R₉, A is favored over each of U and C, and each of U and Cis favored over G; 10) at R₁₀, A is favored over each of U and G, andeach of U and G is favored over C; 11) at R₁₄, A is favored over each ofU and C, and each of U and C is favored over G; 12) at R₁₅, A is favoredover U, U is favored over G, and G is favored over C; 13) at R₁₆, U isfavored over each of C, G and A; 14) at R₁₇, A is favored over C, C isfavored over U, and U is favored over G; 15) at R₁₉, each of U, A, and Cis favored over G; 16) said subsequence does not include tetranucleotidesequence selected from the group consisting of AAAA, UUUU, GGGG, andCCCC; 17) said subsequence has a total G+C content of not more than 10;and 18) said subsequence has a G+C content of not more than 4 betweenR₁₄-R₂₀; selecting the sequence of said mature strand wherein saidmature strand is at least partially complementary to said subsequence;and selecting the sequence of said star strand wherein said star strandis at least partially complementary to said mature strand.
 82. Themethod of claim 81 wherein said mature strand sequence comprises: 5′U₁M₂M₃M₄M₅M₆M₇M₈M₉M₁₀M₁₁M₁₂M₁₃M₁₄M₁₅M₁₆M₁₇M₁₈M₁₉M₂₀M₂₁ 3′

wherein M₂-M₂₁ are nucleotides.
 83. The method of claim 82 wherein saidstar strand sequence comprises 5′S₁S₂S₃S₄S₅S₆S₇S₈S₉S₁₀S₁₁S₁₂S₁₃S₁₄S₁₅S₁₆S₁₇S₁₈S₁₉S₂₀G₂₁ 3′

wherein S₁-S₂₀ are nucleotides, and wherein M₁₂ and S₁₀ are notcomplementary bases.
 84. The method of claim 81 wherein saidnon-naturally occurring miR-196a-2 comprises the sequence:

wherein M is said mature strand sequence and S is said star strandsequence.